Methods and compositions for high yield production of eukaryotic proteins

ABSTRACT

The present invention provides an isolated nucleic acid comprising a first nucleotide sequence encoding an amino acid sequence comprising at least three positively charged amino acid residues, positioned upstream and in frame with a second nucleotide sequence encoding a protein. In addition, the present invention provides an isolated nucleic acid comprising a first nucleotide sequence encoding a DNA binding protein, positioned upstream and in frame with a second nucleotide sequence encoding a protein. An isolated nucleic acid is also provided, which comprises a first nucleotide sequence encoding a bacteriophage lambda repressor protein, positioned upstream and in frame with a second nucleotide sequence encoding a protein. The present invention further provides a method of producing a eukaryotic protein in a bacterial cell comprising: a) introducing the expression vector of this invention, wherein the second nucleotide sequence encodes a eukaryotic protein, into the bacterial cell; and b) culturing the bacterial cell under conditions whereby the second nucleotide sequence of the expression vector is expressed to produce the eukaryotic protein. A method of producing a eukaryotic protein in a bacterial cell in high yield is also provided, comprising: a) introducing the expression vector of this invention, wherein the second nucleotide sequence encodes a eukaryotic protein, into the bacterial cell; and b) culturing the bacterial cell under conditions whereby the second nucleotide sequence of the expression vector is expressed to produce the eukaryotic protein in high yield.

[0001] This application claims priority to U.S. provisional applicationSerial No. 60/081,989, filed Apr. 16, 1998, and the 60/081,989application is herein incorporated by this reference in its entirety.

[0002] This invention was made with government support under grantnumber DK46205 awarded by the National Institute of Diabetes andDigestive and Kidney Diseases and grant number GM15431 awarded by theNational Institute of General Medical Sciences of the NationalInstitutes of Health. The government has certain rights in theinvention.

BACKGROUND OF THE INVENTION

[0003] 1. Field of the Invention

[0004] The present invention relates to compositions and methods for thehigh yield production of eukaryotic proteins and in particular membraneproteins, by expression of recombinant vectors designed for such highyield production in bacterial cells.

[0005] 2. Background Art

[0006] Certain classes of eukaryotic, prokaryotic and viral proteins,including membrane proteins, needed in large quantities for therapeuticuses as well as for biochemical and structural studies, have provendifficult to express in recombinant systems in sufficient yields. Thisis particularly difficult for eukaryotic proteins with multiple membranespanning regions including, but not limited to, G-protein coupledreceptors (GPCRs) and ion channels derived from eukaryotic cells(Goeddel, 1990).

[0007] Eukaryotic membrane proteins have been expressed in a number ofeukaryotic systems including mammalian cells, baculovirus systems [up to55 pmol/mg of protein (125 μg/L of culture); Loisel et al., 1997] andyeast cells (up to 14 pmol/mg membrane protein; Sander et al., 1994).However, none of these approaches has proven successful for theproduction of large quantities of purified eukaryotic proteins.

[0008] Furthermore, although a number of reports in the literaturedescribe expression of eukaryotic membrane proteins such as GPCRs inprokaryotic cells (e.g., E. coli), none of these systems has provencapable of producing high levels of an intact eukaryotic protein (TableI). These bacterial cell systems have produced GPCRs in amounts ofapproximately several hundred receptor molecules per cell, with none ofthe systems producing greater than 300 receptors per cell, whichcorresponds to approximately 5 μg protein per liter of bacterialculture. TABLE 1 Expression levels of β adrenergic receptor in E. coliLeader Sequence Expression level LamB 33 to 225 receptors/cell (Chapotet al. 1990) β-galactosidase 25 receptors/cell (Marullo et al., 1988)none 200 receptors/levels (Breyer et al. 1990)

[0009] The present invention overcomes previous shortcomings associatedwith high yield production of eukaryotic proteins by providingcompositions and methods for producing eukaryotic proteins and inparticular, membrane proteins, in high yield (i.e., at least 100 μgprotein/L of culture), for use in biochemical and structural studies andas therapeutic agents.

BRIEF DESCRIPTION OF THE DRAWING

[0010]FIG. 1. Western blot analysis of PGE₂EP₂-cI-fusion proteinsproduced from nucleic acid constructs comprising: no leader sequence(cI⁰); a leader sequence consisting of amino acids 1-15 of the lambda cIrepressor protein (cI-¹⁻¹⁵); a leader sequence consisting of amino acids1-22 of the lambda cI repressor protein (cI-¹⁻²²); a leader sequenceconsisting of amino acids 1-36 of the lambda cI repressor protein(cI¹⁻³⁶); and a leader sequence consisting of amino acids 1-76 of thelambda cI repressor protein (cI₁₋₇₆). The PGE₂EP₂-cI-fusion proteinswere produced from a construct having a T7 promoter, nucleic acidencoding the leader sequences as described above and a nucleic acidencoding the PGE₂EP₂ protein. Proteins in nitrocellulose were blottedwith an affinity-purified sheep anti-PGE₂EP₂ antibody and a secondaryanti-sheep antibody conjugated to horse radish peroxidase and reactedwith substrate according to standard methods to produce a luminescentreaction product.

SUMMARY OF THE INVENTION

[0011] The present invention provides an isolated nucleic acidcomprising a first nucleotide sequence encoding an amino acid sequencecomprising at least three positively charged amino acid residues,positioned upstream and in frame with a second nucleotide sequenceencoding a protein.

[0012] In addition, the present invention provides an isolated nucleicacid comprising a first nucleotide sequence encoding a DNA bindingprotein, positioned upstream and in frame with a second nucleotidesequence encoding a protein.

[0013] An isolated nucleic acid is also provided which comprises a firstnucleotide sequence encoding a bacteriophage lambda repressor protein,positioned upstream and in frame with a second nucleotide sequenceencoding a protein.

[0014] Further provided in this invention is an isolated nucleic acidhaving the nucleotide sequence selected from the group consisting of SEQID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, and SEQ IDNO:6.

[0015] The present invention further provides a method of producing aeukaryotic protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes a eukaryotic protein into the bacterial cell; and b)culturing the bacterial cell under conditions whereby the secondnucleotide sequence of the expression vector is expressed to produce theeukaryotic protein.

[0016] A method of producing a eukaryotic protein in a bacterial cell inhigh yield is also provided, comprising: a) introducing the expressionvector of this invention, wherein the second nucleotide sequence encodesa eukaryotic protein into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the eukcaryotic proteinin high yield.

[0017] Additionally, the present invention provides a method ofproducing a eukaryotic integral membrane protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes a eukaryotic integralmembrane protein into the bacterial cell; and b) culturing the bacterialcell under conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the eukaryotic integralmembrane protein.

[0018] Furthermore, the present invention provides a method of producinga eukaryotic G-protein coupled receptor protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes a eukaryotic G-proteincoupled receptor protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the eukaryoticG-protein coupled receptor protein.

[0019] Additionally provided is a method of producing a eukaryotic ionchannel protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes a eukaryotic ion channel protein into the bacterialcell; and b) culturing the bacterial cell under conditions whereby thesecond nucleotide sequence of the expression vector is expressed toproduce the eukaryotic ion channel protein.

[0020] The present invention also provides a method of producing arabbit prostaglandin (PG) E₂ EP₃ receptor protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes the rabbit prostaglandinE₂ EP₃ receptor protein, into the bacterial cell; and b) culturing thecell under conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the rabbit prostaglandin E₂EP₃ receptor protein.

[0021] The present invention further provides a method of producing ahuman prostaglandin E₂ EP₂ receptor protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes the human prostaglandinE₂ EP₂ receptor protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the human prostaglandinE₂ EP₂ receptor protein.

[0022] Also provided is a method of producing a human chemokine receptorCCR-5 protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes the human chemokine receptor CCR-5 protein, into thebacterial cell; and b) culturing the bacterial cell under conditionswhereby the second nucleotide sequence of the expression vector isexpressed to produce the human chemokine receptor CCR-5 protein.

[0023] In addition, the present invention provides a method of producinga human β₂ adrenergic receptor protein in a bacterial cell comprising:a) introducing the expression vector of this invention, wherein thesecond nucleotide sequence encodes the human β₂ adrenergic receptorprotein, into the bacterial cell; and b) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the human β₂, adrenergicreceptor protein.

[0024] The present invention further provides a method of producing arat renal outer medullary K⁺ channel protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes the rat renal outermedullary K⁺ channel protein, into the bacterial cell; and b) culturingthe bacterial cell under conditions whereby the second nucleotidesequence of the expression vector is expressed to produce the rat renalouter medullary K⁺ channel protein.

[0025] Finally provided is a method of producing a human small G-proteinrho protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes the small G-protein rho protein, into the bacterialcell; and b) culturing the bacterial cell under conditions whereby thesecond nucleotide sequence of the expression vector is expressed toproduce the small G-protein rho protein.

[0026] Various other objectives and advantages of the present inventionwill become apparent from the following detailed description.

DETAILED DESCRIPTION OF THE INVENTION

[0027] As used herein, “a” or “an” can mean multiples. For example, “acell” can mean at least one cell or more than one cell.

[0028] The present invention provides an isolated nucleic acidcomprising a first nucleotide sequence encoding an amino acid sequencecomprising at least three positively charged amino acid residues,positioned upstream and in frame with a second nucleotide sequenceencoding a protein. As used herein, an “amino acid sequence comprisingat least three positively charged residues” means an amino acid sequencehaving at least three and possibly more than three positively chargedresidues (e.g. arginine, lysine etc.) which can be consecutive, closelyspaced, or randomly spaced. The amino acid sequence comprising at leastthree positively charged residues is the “leader sequence” of the fusionprotein made by the methods of this invention. Neither the leadersequence nor the overall protein sequence (i.e., the leader sequence andthe protein sequence together) need have a net positive charge, i.e., apI value>7. The leader sequence can be as short as five amino acids (aa)long (e.g., 15 aa), although longer sequences (e.g., about 36 to 76amino acids) are preferred. For example, the leader sequence of thefusion protein of this invention can comprise a nucleic acid encoding anamino acid sequence of a DNA binding protein, such as the bacteriophagelambda repressor protein.

[0029] Thus, the present invention further provides an isolated nucleicacid comprising a first nucleotide sequence encoding a DNA bindingprotein, positioned upstream and in frame with a second nucleotidesequence encoding a protein. As used herein, a “DNA binding protein”means a protein, which in its native setting, binds DNA and regulatesits function. The DNA binding protein can be selected from the groupconsisting of eukaryotic DNA binding proteins, prokaryotic DNA bindingproteins and bacteriophage-derived DNA binding proteins. For example,the DNA binding proteins of this invention can include, but are notlimited to, bacteriophage DNA binding proteins such as lambda (λ)repressor, λ cro repressor, phage P22 arc repressor and phage P22 nmtrepressor; bacterial DNA binding proteins such as the lac repressor andthe trp repressor; eukaryotic binding proteins such as the yeast gal 4protein; and mammalian transcription factors such as fos and jun as wellas histones, transcriptional activators such as CREB and any other DNAbinding protein now known or later identified.

[0030] It is also appreciated by one of skill in the art that DNAbinding proteins can include fragments which retain at least threepositively charged residues.

[0031] Also provided is an isolated nucleic acid comprising a firstnucleotide sequence encoding a bacteriophage lambda repressor protein(having 236 amino acids, as shown in SEQ ID NO:9), positioned upstreamand in frame with a second nucleotide sequence encoding a protein. Inaddition, the first nucleotide sequence of the nucleic acid of thisinvention can encode the N-terminal domain of the bacteriophage lambdarepressor protein (having amino acids 1-92, as shown in SEQ ID NO:10),amino acids 1-76 (SEQ ID NO:11) of the bacteriophage lambda repressorprotein, or at least 15 contiguous amino acids of the N-terminal domainof the bacteriophage lambda repressor protein.

[0032] The λ repressor protein is a product of the cl gene ofbacteriophage λ. This gene encodes a protein of 236 amino acidsorganized into two domains, and N-terminal DNA binding domain consistingof amino acids 1-92 and a C-terminal domain consisting of amino acids˜114-236 (Sauer, 1978; Sauer et al., 1979).

[0033] In the nucleic acid of this invention as described above, thesecond nucleotide sequence can encode any protein which can be producedexogenously in a bacterial protein expression system. For example, theprotein of this invention can be a viral protein, a prokaryotic proteinor a eukaryotic protein. Viral proteins encoded by the second nucleotidesequence of the nucleic acid of this invention can include, but are notlimited to, a homolog of the G-coupled receptor protein fromcytomeaglovirus, herpesvirus 6, herpesvirus 7, Kaposi'ssarcoma-associated herpesvirus (human herpesvirus 8), herpesvirussaimiri (e.g., the gene product ot ECRF3) human immuodeficiency virus(HIV) proteins gp120 and gp41, measles virus F protein, influenzahemagglutinin protein and herpesvirus B and H proteins. Prokaryoticproteins encoded by the second nucleotide sequence of the nucleic acidof this invention can include, but are not limited to, diacylglycerolkinase bacterial membrane protein and the lamB gene product. Eukaryoticproteins encoded by the second nucleotide sequence of the nucleic acidof the present invention can include, but are not limited to eukaryoticproteins selected from the group consisting of integral membraneproteins, G-protein coupled receptor (GPCR) proteins and ion channelproteins. Integral membrane proteins are proteins which have at leastone hydrophobic amino acid sequence which passes through the membranelipid bilayer as a transmembrane region or domain. GPCRs are asuperfamily of integral membrane proteins which are widely distributedin eukaryotic cells and consist of seven transmembrane domainsinterconnected by a series of peptide loops. In their nativeenvironment, these proteins bind ligand from their exofacial surface andtransmit signal to the intracellular side via heterotrimericguanine-nucleotide binding proteins (G-proteins). Ion channel proteinsare integral membrane proteins which, in their native environment, theplasma membrane of virtually all cell types, form a pore in the lipidbilayer which allows the selective passage of one or more ions, eitherinto or out of the cell.

[0034] Furthermore, the second nucleotide sequence of the nucleic acidof this invention can encode a protein selected from the groupconsisting of rabbit prostaglandin E₂EP₃ receptor protein, humanprostaglandin E₂EP₂ receptor protein, human chemokine receptor CCR-5protein, human β₂ adrenergic receptor protein, rat renal outer medullaryK⁺ channel protein and human small G-protein rho.

[0035] The present invention additionally provides an isolated nucleicacid having the nucleotide sequence selected from the group consistingof SEQ ID NO:1 (plasmid pLJM5.22His encoding the cI-77A his fusionprotein), SEQ ID NO:2 (plasmid pCK2.5 HTL encoding cI-EP₂his-thrombin-lambda repressor C terminal domain aa 82-236), SEQ ID NO:3(plasmid pSD1.63 his encoding cI-CCR5 his), SEQ ID NO:4 plasmidpSD1.18his encoding cI-βAR his), SEQ ID NO:5 (plasmid pSD1.134hisencoding cI-ROMK; his), and SEQ ID NO:6 (plasmid pSD2.46his encodingcI-rho his).

[0036] As used herein, “nucleic acid” refers to single- ordouble-stranded molecules which may be DNA, comprising two or morenucleotides comprised of the nucleotide bases A, T, C and G, or RNA,comprised of the bases A, U (substitute for T), C and G. The nucleicacid may represent a coding strand or its complement. Thus, the presentinvention also provides nucleic acids complementary to, or capable of,hybridizing with the nucleic acids of this invention. The nucleic acidof this invention may be a naturally occurring nucleic acid or thenucleic acid may be a synthetic nucleic acid sequence which containsalternative codons which encode the same amino acid as that which isfound in a naturally occurring sequence (Lewin, 1994). Furthermore, thenucleic acids of this invention can include codons which encode aminoacids which represent conservative substitutions of amino acids that donot alter the function of the protein, as are well known in the art.

[0037] As used herein, the term “isolated” means a nucleic acidseparated or substantially free from at least some of the othercomponents of the naturally occurring organism, for example, the cellstructural components commonly found associated with nucleic acids in acellular environment and/or other nucleic acids. The isolation ofnucleic acids can therefore be accomplished by techniques such as celllysis followed by phenol plus chloroform extraction, followed by ethanolprecipitation of the nucleic acids (Michieli et al.,1996). The nucleicacids of this invention can be isolated from cells according to methodswell known in the art. Alternatively, the nucleic acids of the presentinvention can be synthesized according to standard protocols welldescribed in the literature.

[0038] The nucleic acid of this invention can be part of a recombinantnucleic acid comprising any combination of restriction sites and/orfunctional elements as are well known in the art which facilitatemolecular cloning, expression, post-translational modifications andother recombinant DNA manipulations. For example, the nucleic acid ofthis invention encodes a leader sequence fused to a protein sequence toproduce a fusion protein from which the leader sequence can be cleavedto yield only the protein sequence. Thus, nucleotide sequences whichencode amino acid sequences which provide for the enzymatic or chemicalcleavage of the leader peptide from the mature polypeptide, as well asregulatory sequences which allow temporal regulation of expression ofthe nucleic acid of this invention can also be included in the nucleicacid of this invention. Thus, the present invention further provides arecombinant nucleic acid comprising the nucleic acid of the presentinvention. In particular, the present invention provides a vectorcomprising the nucleic acid of this invention and a cell comprising thevector of this invention.

[0039] The vector of this invention can be an expression vector whichcontains all of the genetic components required for expression of thenucleic acid in cells into which the vector has been introduced, as arewell known in the art. The expression vector can be a commercialexpression vector or it can be constructed in the laboratory accordingto standard molecular biology protocols.

[0040] The vector of this invention is introduced into a bacterial cellunder conditions whereby the resulting stable transformants maintain thevector, as are well known in the art and as described in the Examplesprovided herein.

[0041] The vector of this invention is introduced into a bacterial cellaccording to standard procedures for introducing nucleic acid intoprokaryotes, as are well known in the art. There are numerous E. coliexpression vectors known to one of ordinary skill in the art useful forthe expression of proteins in prokaryotic systems. Other microbial hostssuitable for use include bacilli, such as Bacillus subtilus and otherenterobacteria, such as Salmonella, Serratia and various Pseudomonasspecies. Expression vectors for prokaryotic systems typically containexpression control sequences compatible with the host cell (e.g., anorigin of replication) and an antibiotic resistance marker to providefor the growth and selection of the expression vector in a bacterialhost. In addition, any number of a variety of well-known promoters canbe present, such as the T7 promoter system, the lactose promoter system,a tryptophan (Trp) promoter system, a beta-lactamase promoter system, ora promoter system from phage lambda. The promoters will typicallycontrol expression, optionally with an operator sequence and haveribosome binding site sequences for example, for initiating andcompleting transcription and translation. The vector can also containexpression control sequences, enhancers that may regulate thetranscriptional activity of the promoter, appropriate restriction sitesto facilitate cloning of inserts adjacent to the promoter and othernecessary information processing sites, such as RNA splice sites,polyadenylation sites and transcription termination sequences as well asany other sequence which may facilitate the expression of the insertednucleic acid.

[0042] The nucleic acid in the vector of this invention can be expressedin cells after the nucleotide sequences have been operably linked to,i.e., positioned, to ensure the functioning of an expression controlsequence. These expression vectors are typically replicable in the cellseither as episomes or as an integral part of the cell's chromosomal DNA.Commonly, expression vectors can contain selection markers, e.g.,tetracycline resistance, ampicillin resistance, kanamycin resistance orchlormaphenicol resistance, etc., to permit detection and/or selectionof those bacterial cells transformed with the desired nucleic acidsequences (see, e.g., U.S. Pat. No. 4,704,362).

[0043] Thus, the present invention provides a method of producing aeukaryotic protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes a eukaryotic protein into the bacterial cell; and b)culturing the bacterial cell under conditions whereby the secondnucleotide sequence of the expression vector is expressed to produce theeukaryotic protein.

[0044] A method of producing a eukaryotic protein in a bacterial cell inhigh yield is also provided, comprising: a) introducing the expressionvector of this invention, wherein the second nucleotide sequence encodesa eukaryotic protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the eukaryotic proteinin high yield. As used herein, “high yield” means that the protein isproduced in an amount which is at least, and preferably is greater than,100 μg/liter of bacterial culture. More preferably, high yield means theprotein is produced in an amount which is at least 0.5 mg/liter ofculture and most preferably, the protein is produced in an amount whichis at least 2.0 mg/liter of culture.

[0045] In addition, the present invention provides a method of producinga eukaryotic integral membrane protein in a bacterial cell comprising:a) producing the expression vector of this invention, wherein the secondnucleotide sequence encodes a eukaryotic integral membrane protein; b)introducing the vector into the bacterial cell; and c) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the eukaryotic integralmembrane protein.

[0046] The present invention also provides a method of producing aeukaryotic G-protein coupled receptor protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes a eukaryotic G-proteincoupled receptor protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the expression vector is expressed to produce the eukaryoticG-protein coupled receptor protein.

[0047] Additionally, the present invention provides a method ofproducing a eukaryotic ion channel protein in a bacterial cellcomprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes a eukaryotic ion channelprotein, into the bacterial cell; and b) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the eukaryotic ion channelprotein.

[0048] Furthermore, the present invention provides methods for theproduction of specific eukaryotic proteins. In particular, the presentinvention provides a method of producing a rabbit prostaglandin E₂ EP₃receptor protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes the rabbit prostaglandin E₂ EP₃ receptor protein, intothe bacterial cell; and b) culturing the cell under conditions wherebythe second nucleotide sequence of the expression vector is expressed toproduce the rabbit prostaglandin E₂ EP₃ receptor protein. The rabbitPGE₂EP₃ receptor protein is a member of the family of GPCRs, which inits native environment (e.g., the plasma membrane of cells in thekidney, stomach and adrenal glands, among others), binds prostaglandinE₂ and elicits intracellular signals (Breyer, et al., 1994) on theintracellular side via heterotrimeric guanine-nucleotide bindingproteins (G-proteins).

[0049] Also provided is a method of producing a human prostaglandin E₂EP₂ receptor protein in a bacterial cell comprising: a) introducing theexpression vector of this invention, wherein the second nucleotidesequence encodes the human prostaglandin E₂ EP₂ receptor protein, intothe bacterial cell; and b) culturing the bacterial cell under conditionswhereby the second nucleotide sequence of the expression vector isexpressed to produce the human prostaglandin E₂ EP₂ receptor protein.The human PGE₂ EP₂ receptor is a member of the family of GPCRs which, inits native environment (e.g., the plasma membrane of cells in the lung,uterus and blood cells, among others), binds prostaglandin E₂ andelicits intracellular signals on the intracellular side viaheterotrimeric guanine-nucleotide binding proteins (G-proteins).

[0050] Further provided is a method of producing a human chemokinereceptor CCR-5 protein in a bacterial cell comprising: a) introducingthe expression vector of this invention, wherein the second nucleotidesequence encodes the human chemokine receptor CCR-5 protein, into thebacterial cell; and b) culturing the bacterial cell under conditionswhereby the second nucleotide sequence of the expression vector isexpressed to produce the human chemokine receptor CCR-5 protein. Thehuman chemokine receptor CCR-5 (alternatively named CC-CKR5) is a memberof the family of GPCRs which, in its native environment (the plasmamembrane of T cells and macrophages, among others), binds chemokinepeptide hormones on the exofacial surface of the cell and elicitsintracellular signals on the intracellular side of the plasma membranevia heterotrimeric guanine-nucleotide binding proteins (G-proteins). Inaddition, CCR-5 is utilized by the human immunodeficiency virus (HIV)virus as a co-receptor which facilitates viral entry into the host cellduring the pathogenesis of viral infection.

[0051] Production of the chemokine receptor protein, CCR-5, in highyield provides for a number of therapeutic uses. For example, the CCR-5protein can be used as an immunogen to develop autoantibodies to theCCR-5 protein. This active immunization would then inhibit HIV entryinto target cells expressing the CCR-5 receptor. Alternatively, theCCR-5 protein can be used for passive immunization wherein an animal(e.g., a horse) can be immunized with the CCR-5 fusion protein of thisinvention and the resulting antiserum collected and purified. The horseIg anti-CCR-5 fusion protein fraction can then be administered to humansto inhibit HIV entry into target cells expressing the CCR-5 receptor. Athird use of the CCR-5 protein can be to administer the CCR-5 protein ofthis invention into a subject infected with HIV in an amount which canbind to HIV in the subject and inactivate it.

[0052] In addition, the present invention provides a method of producinga human β₂ adrenergic receptor protein in a bacterial cell comprising:a) introducing the expression vector of this invention, wherein thesecond nucleotide sequence encodes the human β₂ adrenergic receptorprotein, into the bacterial cell; and b) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the human β₂ adrenergicreceptor protein. The human β₂ adrenergic receptor is a member of thefamily of GPCRs which, in its native environment (the plasma membrane ofcells in the heart, lungs, blood vessels, intestine and other organs andtissues), binds epinephrine and its natural and synthetic analogs ontheir exofacial surface and elicits intracellular signals on theintracellular side via heterotrimeric guanine-nucleotide bindingproteins (G-proteins).

[0053] Furthermore, the present invention provides a method of producinga rat renal outer medullary K⁺ channel (ROMK) protein in a bacterialcell comprising: a) introducing the expression vector of this invention,wherein the second nucleotide sequence encodes the rat renal outermedullary K⁺ channel protein, into the bacterial cell; and b) culturingthe bacterial cell under conditions whereby the second nucleotidesequence of the expression vector is expressed to produce the rat renalouter medullary K⁺ channel protein. ROMK is a member of the family ofion channel proteins which, in its native environment (the plasmamembrane of cells in the kidney, brain, heart and stomach), allows theselective secretion of potassium ions from the intracellular milieu.

[0054] The present invention also provides a method of producing a humansmall G-protein rho protein in a bacterial cell comprising: a) producingthe expression vector of this invention, wherein the second nucleotidesequence encodes the small G-protein rho protein; b) introducing thevector into the bacterial cell; and c) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the small G-protein rhoprotein. The protein rho is a member of the family of “small GTP bindingproteins” which, in its native environment, is a membrane associatedprotein but is not an integral membrane protein i.e., it does notcontain any transmembrane regions.

[0055] The methods of this invention can further comprise the step ofisolating and purifying the protein according to methods well known inthe art and as described herein (see, e.g., Sambrook et al.).Additionally, for all of the methods of the present invention, theeukaryotic protein which is produced can be produced in high yield, asdefined herein.

[0056] The proteins produced by the methods of this invention can becleaved from the leader sequence as described herein, re-folded andtested for functionality. For example, the proteins of this inventioncan be refolded according to methods well known in the art for refoldingmembrane proteins (see, e.g., Braiman et al., 1987). Briefly, theinclusion bodies containing the fusion protein of this invention can beisolated from the bacterial cells in which they were produced andsolubilized in buffer containing 0.2% sodium dodecyl sulfate (SDS). Thefusion proteins can then be mixed with buffer containing dimyristoylphosphatidyl choline (DMPC) and CHAPS detergent, to allow for therenaturation of the detergent-solubilized protein.

[0057] The proteins of this invention can be tested for functionality bya variety of methods. For example, the presence of antigenic epitopesand ability of the proteins to bind ligands can be determined by Westernblot assays, fluorescence cell sorting assays, immunoprecipitation,immunochemical assays and/or competitive binding assays, as well as anyother assay which measures specific binding activity.

[0058] For ion channel proteins of this invention, the proteins can bereconstituted into artificial lipid bilayers (so-called black lipidbilayers) according to methods well known in the art. The ability of theion channel to function can then be tested electrophysiologically, usingdirect current/voltage measurements according to well known methods(Zweifach et al., 1991). Alternatively, the ion channel can beincorporated into liposomes according to standard methods, and activitycan be monitored by the ability of ion fluxes to change the fluorescenceof indicator dyes contained within the liposomes, as known in the art(Le Caherec et al., 1996; Zeidel et al., 1992).

[0059] GPCRs can be tested for functionality by testing them for theability to bind radiolabeled ligand in a specific and saturable manner,as is well known in the art (Limbird, 1996). Moreover, activation ofGPCRs leads to a concomitant increase in enzymatic activity by GTPbinding proteins (increased turnover of GTP to GDP). The GPCRs of thisinvention can also be reconstituted with purified heteromeric G-proteinsin liposomes and the rate of GTP hydrolysis upon addition of the cognateagonist can be measured according to well known methods. An increase inrate indicates a properly folded and functioning protein.

[0060] The proteins of this invention can be used in a number ofpractical applications including, but not limited to:

[0061] 1. Immunization with recombinant host protein antigen as aviral/pathogen antagonist.

[0062] 2. Production of membrane proteins for diagnostic or screeningassays.

[0063] 3. Production of membrane proteins for biochemical studies.

[0064] 4. Production of membrane protein for structural studies.

[0065] 5. Antigen production for generation of antibodies forimmuno-histochemical mapping, including mapping of orphan receptors andion channels.

[0066] In particular, the proteins of this invention can be used asimmunogens in vaccine protocols and as antigens for the production ofmonoclonal and polyclonal antibodies according to methods standard inthe art (see, e.g., Harlow and Lane, 1988). Additionally, the proteinsof this invention can be used in diagnostic and screening assays todetect the presence of a ligand which binds the protein of thisinvention, in a biological sample from a subject. Such assays can becarried out by detecting the binding of the protein with a ligand in thesample. Such detection methods can include identifying the formation ofa protein/ligand complex with detectable antibodies which specificallybind the protein and/or the ligand as well as by competitive bindingassays as are well known in the art.

[0067] The present invention is more particularly described in thefollowing examples which are intended as illustrative only sincenumerous modifications and variations therein will be apparent to thoseskilled in the art.

EXAMPLES Example 1 Plasmid pLJM5-22H-Fusion of Rabbit EP₃ Receptor Withthe N-terminal Domain of λ Repressor.

[0068] Construction of cI-77A fusion E. coli expression vector—PlasmidpLJM5-22H (SEQ ID NO:1) consists of the first 231 bp of a semi-syntheticcI gene (Breyer and Sauer, 1989a), encoding amino acids 1-76 (SEQ IDNO:11), fused to the N-terminus of the EP₃ receptor 77A splice variant.It should be noted that the numbering convention for the λ-repressorprotein designates serine encoded at codon 2 of the cI gene as position1 because the N-terminal methionine is cleaved in the mature λ-repressorprotein. It is not known whether the cleavage of the initiatormethionine occurs in the fusion proteins, however the conventiondesignating serine as amino acid 1 is utilized for the various fusionproteins described below.

[0069] The C-terminus of the 77A protein was modified to remove the stopcodon, and an XhoI restriction site was introduced immediately 3′ to thecoding region allowing a C-terminal fusion. The vector, ptac promoterand synthetic cI gene sequences were derived from the plasmid pRB200(Breyer and Sauer, 1989a). Internal EcoRI and XhoI sites in thesynthetic cI gene were digested and the 77A cDNA was inserted as followsto produce the plasmid: an EcoRI/NdeI adapter pair [5′-AATTCGCAGCTCA-3′(SEQ ID NO:16) and 5′-TATGAGCTGCG 3′ (SEQ ID NO:17)] was used to fusethe initiator methionine of 77A from plasmid pRC/CMV 77A wt (Audoly andBreyer, 1997) in frame with amino acid 76 of the cI sequence. The 3′ endof the 77A cDNA was modified by PCR, removing the TGA stop codon and the3′ untranslated region. An XhoI restriction site was simultaneouslyintroduced allowing fusion to a C-terminal 6×-his tag. The sequence ofthe sense primer, 5′-ACA TCA GTT GAG CAC TGC-3′ (SEQ ID NO:18) lieswithin the 77A coding region. The antisense primer 5′-CCT CGA GGC TTGCTG ATA AGG ACG AGC-3′ (SEQ ID NO:19) lies at the 3′ end of the codingregion. PCR was performed with Vent DNA polymerase for 30 cycles of 94°C. for 15 sec, 51° C. for 15 sec and 72° C. for 30 sec. The PCR fragmentwas digested with BsmI (a restriction site internal to 77A) and XhoI andligated into the 77A cDNA backbone reconstituting the full length 77AcDNA with an XhoI site at the 3′ end of the coding region. This plasmidwas digested at the XhoI site and a HindIII site 3′ to the EP codingregion. The his tag linker oligonucleotide sequences 5′ TCG AGG CAC CATCAC CAC CAC CAC TGA A 3′ (SEQ ID NO:20) and 5′ AGC TTT CAG TGG TGG TGGTGA TGG TGC C 3′ (SEQ ID NO:21) oligonucleotides encoding a 6× his tagwere ligated to the backbone, allowing fusion to a C-terminal 6× his tagused for affinity purification.

[0070] The NT (No-Tail) EP₃ receptor expression construct, which lacksthe C-terminal sequence encoded by the alternatively spliced variablesequence of the 77A splice variant, was generated by PCR using the 77AcDNA as a template. The receptor sequence ends at Q³⁵⁵, 10 residuesdistal to the end of transmembrane VII, and is fused to the 6× his tagat the C-terminus. A PCR fragment was generated using a sense internalprimer at nt 532 of the coding region (5′-TGG CTG GCA GTG CTC GCC-3′)(SEQ ID NO:22) and a downstream primer (5′-TCA CCT CGA GGC CTG GCA AAACTT CCG AAG-3′) (SEQ ID NO:23) which inserts an XhoI site immediatelydistal to nt 1065 (Q³⁵⁵), allowing amplification of a 534 bp fragment.The PCR was performed with Vent DNA polymerase for 35 cycles of 98° C.for 15 sec, 57° C. for 15 sec and 72° C. for 60 sec. This PCR productwas digested at internal SacI and XhoI restriction sites and theresulting fragment was subcloned into the SacI and XhoI sites inexpression vector pLJM5-22H to yield the plasmid pLJM6-09 (SEQ ID NO:7)which expresses the cI-NT-his fusion protein.

[0071] Plasmid pLJM5-42T (SEQ ID NO:8) expresses a third variant of the77A protein, cI-77A-TL. In addition to the N-terminal cI¹⁻⁷⁶ fusion, thecI-77A-TL construct (for Thrombin-Lambda C-terminal fusion) is fused toamino acids 82-236 of the cI gene utilizing a thrombin cleavage sitelinker between the C-terminal sequence of 77A and the lambda C-terminalfusion. The thrombin cleavage sequence LVPRGS (SEQ ID NO:22), allowscleavage of the C-terminal fusion from the purified recombinant protein.This construct was made by inserting the oligonucleotide pair 5′ TCG ACCCTG GTG CCA CGC GGA TCC GT 3′ (SEQ ID NO:37) and 5′ TCG AAC GGA TCC GCGTGG CAC CAG GC 3′ (SEQ ID NO:24) into the XhoI site at the 3′ end of the77A fragment outlined above. This allows an in-frame fusion of the 77Areceptor to the thrombin cleavage sequence followed by amino acids82-236 of the cI gene. This thrombin cleavage sequence or similarsequences (e.g. enterokinase) can be inserted between the N-terminalcI-¹⁻⁷⁶ fusion as well, allowing the isolation of protein products whichlack the fusion peptides.

[0072] Induction of cI-EP₃ fusion protein expression. E. coli strainDH5α cells transformed with various expression plasmids was grown in2×LB medium containing 100 μg/ml of ampicillin (2×LBA100). Cells weregrown in 2×LBA100 with shaking at 37° C. until the culture reached anA₆₀₀ of 0.8. Protein expression was induced by addition of 1 mMisopropyl β-thiogalactoside (IPTG), followed by a further incubation at30° C. for 5 hours. Cells were harvested by centrifugation at 2,500×g,flash frozen in liquid nitrogen and stored at −80° C.

[0073] Protein Purification by Ni-NTA Column—Bacterial cell pellets wereresuspended in Buffer 1 (50 mM Tris Cl, 150 mM Na Cl, 0.1% NaN₃, 10 mMCHAPS 20% glycerol, 2 mM PMSF, 1.4 mM β-ME, pH 8.0) followed bysonication three times for 20 sec, on ice. The inclusion bodies werecollected by centrifugation at 35,000×g for 20 min at 4° C. Thesupernatant was discarded and the inclusion bodies were washed two moretimes with Buffer 1 using the same centrifugation protocol. Washedinclusion bodies were dissolved in Buffer 2 (50 mM Tris Cl, 500 mM NaCl, 1% NP-40, 0.5% Na deoxycholate, 2 mM PMSF, 2 M urea, 20 mMimidazole, pH 8.0). Ni-NTA agarose beads were added and incubated at 4°C. overnight on a rotary shaker. Agarose beads were collected bycentrifugation at 1000×g for 2 min, and batch washed with 50 volumes ofBuffer 2. After three washes, the fusion protein was batch eluted withBuffer 3 (50 mM Tris Cl, 500 mM Na Cl, 1% NP-40, 0.5% Na deoxycholate, 2mM PMSF, 2 M urea, 100 mM imidazole, pH 8.0). Eluate was dialyzedagainst Buffer 2 and purified a second time on Ni-NTA beads as describedabove.

[0074] Quantitation of Protein. Purified protein was quantitated usingthe BCA protein assay (Pierce). Estimates of the specific content of EP₃fusion protein in the lysate were made utilizing a “dot-blot”immuno-assay employing the 24H monoclonal antibody directed against theN-terminal domain of the fusion partner. The dot blot assay wasperformed as follows: Bacterial cells expressing the cl-fusion proteinwere induced for the required time at the appropriate temperature with 1mM IPTG. The lysate was fractionated to obtain the following fractions:cell lysate, cytosol, washed inclusion bodies, solubilized inclusionbodies and Ni-NTA purified protein. One μl of the various purificationfractions containing the expressed cI-fusion protein were “spotted” ontonitrocellulose filters and allowed to air dry. Additionally, 1 μlvolumes of a range of known amounts of purified fusion protein, theconcentration of which was determined by BCA assay, were spotted as astandard curve. The dried blot was then processed for immuno-detection:the blot was washed with Tween 20 containing Tris buffered Saline(TBS-T) for 10 min followed by a 1 hr blocking step using TBS-Tcontaining 5% (w/v) skim milk. The blot was then rinsed briefly withTBS-T and incubated overnight at 4° C. with TBS-T containing 2% skimmilk with an appropriate dilution of mouse 24H antibody. The blot wasthen washed and incubated with horseradish peroxidase-conjugated goatanti-mouse IgG. The blot was subsequently washed 3×10 min with TBS-T anda chemiluminescence reaction was performed according to themanufacturer's instructions (SuperSignal substrate, Pierce). Intensityof each dot was compared to the known standard by densitometry. Overallyield of the expression/purification was calculated by factoring incorrections for volume of each fraction.

[0075] Production of anti-fusion protein antibodies. The purified fusionprotein of this invention can be used to raise specific antibodies(either monoclonal or polyclonal) against the expressed proteinaccording to protocols well known in the art. In the case of the EP₃receptor, the purified cI-77A-his protein was injected into goats. Thefirst injection was made subcutaneously with 0.5 mg of recombinantprotein in Complete Freunds' adjuvant. Subsequent boost immunizationswere made subcutaneously with 0.25 mg of recombinant protein inIncomplete Freunds' adjuvant. Antiserum was “depleted” by incubationwith E. coli lysate expressing the intact cI gene which had beencovalently coupled to the CNBr activated Sepharose resin. Thisincubation removes or depletes antibodies directed against the cI fusionpartner sequence as well as any antibodies raised to E. coli proteinsfrom either natural infection or present as minor contaminants in theimmunogen. The supernatant retains antibodies to the partner EP₃protein. The resulting depleted antiserum may be further purified byadsorption to the purified cI-77A-his protein antigen coupled to CNBrSepharose. Purified antibody is subsequently eluted from the resin andcan be utilized, for example, in immunodetection assays to identify theEP₃ protein in native tissues or from recombinant sources.

[0076] Ligand Binding Studies—The fusion proteins of the presentinvention can be refolded as described herein and used in a variety ofassays, such as ligand binding studies. For example, the refolded fusionproteins can be used to screen drugs in a variety of assays as are wellknown to one of skill in the art. For example, on the basis that HIVmust interact with CCR-5 to gain entry into cells, substances can bescreened for antiviral activity by detecting the ability of thesubstance to block the binding to the HIV gp120 coat protein to theCCR-5 receptor protein of this invention. Plastic microtiter plates canbe coated with the CCR-5 protein and radiolabeled HIV gp120 coat protein(which can be complexed with recombinant CD-4, the co-receptor) can beadded in the presence of the substance to be screened for antiviralactivity. The amount of radioactive gp120 coat protein bound to theplate in the absence and presence of the substance can be determinedaccording to standard methods. A decrease in or absence of bound gp120as determined by quantitating the radioactive signal indicates asubstance having the ability to inhibit binding of the HIV gp120 coatprotein to the CCR-5 receptor protein, thereby identifying a substancepotentially having antiviral activity. The substance can then be furtherscreened for specific antiviral activity according to protocols wellknown in the art.

[0077] The protein of this invention can also be used in saturationbinding isotherm experiments. For example, the recombinant membraneprotein of this invention is incubated in binding buffer (25 mM KPO₄, pH6.2, 10 mM MgCl₂, and 1 mM EDTA) for 2 h at 30° C. with varyingconcentrations of [³H] PGE₂. Nonspecific binding is determined in thepresence of 50 μM unlabeled PGE₂. Reactions are stopped by rapidfiltration on Whatman GF/F glass fiber filters as described previously(Breyer, et al., 1994). Filters are washed three times with bindingbuffer, dried, and counted in Dupont 989 fluor.

[0078] Expression of cI-77A-his fusion protein—The vector pLJM5-22H hasthe tac promoter and the cI translation initiator sequences. The cDNAencoding the 77A splice variant of the EP₃ receptor was modified asdescribed above to obtain an EP₃ receptor fused to a portion of the cIgene (1-76 aa) at the N-terminus and a C-terminal fusion to a 6× histag. The resultant fusion protein of 501 aa has a predicted molecularweight 55.4 kDa. When expressed in E. coli this construct demonstratedhigh levels of expression of a protein of apparent molecular weight of50 kDa in whole cell lysates when resolved by SDS-PAGE. Expressionlevels of the cI-77A-his fusion protein were estimated to be in therange of 20 mg/L of E. coli culture as determined by dot-blot assay asdescribed herein.

[0079] A second fusion protein cI-NT (No Tail)-his, in which theC-terminal tail amino acids 356 to 411 of the 77A protein were deleted,was also constructed. The cI-NT-his protein expressed by this constructlacks the epitope to which the anti-peptide antibody was generated andthus served as a control for the anti tail-peptide antibody in Westernblot experiments. A third construct fused the C-terminus of lambdarepressor, residues 82-236, to the C-terminus of the cI-77A fusion via athrombin cleavage site linker. This protein designated cI-77A-TL(Thrombin-Lambda) tested the hypothesis that the C-terminal sequence ofE. coli expressed proteins are important determinants of proteindegradation. Because lambda repressor is highly expressed in E. coli,its C-terminus might be resistant to degradation by C-terminal directedproteases. Coomasie blue and Western blot analysis of the lysatesresolved by SDS-PAGE demonstrated that the induced protein of theappropriate molecular weight was reactive with the 24H antibody for eachof the EP₃ constructs tested. This 24H monoclonal Ab was generatedagainst the N-terminal 102 amino acids of the cI gene and itsrecognition sequence is contained within the first 36 amino acids of theantigen (Breyer and Sauer, 1989b).

[0080] Western blot analysis was also performed with an antibody raisedagainst the unique sequence in the 77A cDNA as described herein. Aprotein of 50 kDa was detected for the cI-77A-his lysate, and a proteinof 68 kDa was detected for the cI-77A-TL but neither the cI containinglysate nor the cI-NT-his, in which the target epitope had been deleted,displayed any reactivity.

[0081] Purification of cI-77A-his fusion protein. When the plasmidencoding the cI-77A-his fusion protein was expressed in E. coli, themajority of the cI-77A-his fusion protein produced was found in theinsoluble fraction comprising the “inclusion bodies.” Precipitation of aprotein into inclusion bodies can be advantageous because theprecipitated protein is subjected to minimal proteolysis and can berecovered as a partially pure aggregate. Moreover, mammalian membraneproteins can be toxic when expressed in E. coli membrane andsequestration of the PGE₂EP₃ receptor in inclusion bodies can removesome of the selective disadvantage of PGE₂EP₃ overexpression.

[0082] Inclusion bodies were collected by centrifugation, whichseparated them from the majority of soluble protein contaminants. Theinclusion bodies were washed extensively in the presence of CHAPS, withonly a small loss of cI-77A-his. The washed inclusion bodies weresolubilized in 2M urea in the presence of NP-40 and deoxycholate.Solubilized cI-77A-his was then purified by affinity chromatographyusing Ni-NTA agarose resin in buffer containing 2M urea, NP-40 anddeoxycholate. After a second round of affinity purification on theNi-NTA resin, the cI-77A-his was purified to apparent homogeneity asdetermined by silver staining, with a yield of approximately 10% of theinitial protein expressed or approximately 2 mg protein purified perliter of bacterial culture.

Example 2 Plasmid pCK2-5HTL-Fusion of the Human EP₂ Receptor With theN-terminal Domain of λ Repressor.

[0083] Construction of cI-EP₂ HTL fusion E. coli expression vector.Plasmid pCK2-5HTL (SEQ ID NO:2) consists of the first 231 bp of asemi-synthetic cI gene (Breyer and Sauer, 1989a), encoding amino acids1-76, fused to the N-terminus of the E-Prostanoid receptor, EP₂. Thehuman EP₂ open reading frame was inserted into a plasmid derivative ofpRB200 between the EcoRI and XhoI sites by fusing it to the 3′ end ofthe sequence encoding the first 76 amino acids of lambda repressor usingthe EcoRI restriction enzyme site thereby creating pCK1-23. The stopcodon of the human EP₂ receptor was removed from the hEP2 sequence byPCR. Briefly, the upstream oligonucleotide primer, originating fromposition 398 of the hEP₂ ORF, 5′-AGC GCT ACC TCT CGA TCG-3′ (SEQ IDNO:25), along with the downstream oligonucleotide primer, directedagainst the most 3′ region of hEP₂ ORF, 5′-GCC GCA CTC GAG GCA AGG TCAGCC TGT TTA CT-3′ (SEQ ID NO:26), were used in conjunction with VENTpolymerase to amplify a new fragment lacking the stop codon from thehEP₂ ORF template (underlined sequence represents an XhoI site).Reaction conditions were carried out using 30 cycles of the followingprotocol: 1 min at 95° C., 15 sec at 98° C., 30 sec at 53° C. followedby 1 min at 72° C. The amplified product of 677 bp was digested atinternal Bsu 36I and Xho I sites and the appropriate fragment subclonedinto the plasmid pCK-1-23 to create the intermediate plasmid, pCK-1-38.Next, an oligonucleotide linker was synthesized to fuse the C-terminalportion of lambda repressor, in frame with the hEP₂ sequence. Thesequence of this linker contains a thrombin cleavage site (aa sequenceLVPRGS; SEQ ID NO:15), introduces a BamHI restriction site and beginswith a 6× histidine tag. The two oligonucleotides, 5′-TCG AGC CAC CACCAC CAC CAC TCT AGA CTG GTG CCA CGC G-3′ (SEQ ID NO:27) and 5′-GAT CCGCGT GGC ACC AGT CTA GAG TGG TGG TGG TGG TGG TGG C-3′ (SEQ ID NO:28),were annealed together at 65° C. and subcloned into pCK-1-38 followingits digestion with Xho I and Bam HI, thereby creating the plasmid,CK2-5-HTL. The underlined sequences represent the histidine tag, whilethose in bold refer to the thrombin cleavage site.

[0084] Induction of cI-EP₂-HTL fusion protein expression—E. coli strainDH5α cells transformed with the pCK2-5-HTL expression plasmids weregrown in 2×LB medium containing 100 μg/ml of ampicillin (2×LBA100).Cells were grown in 2×LBA100 with shaking at 37° C. until the culturereached an A₆₀₀ of 0.8. Protein expression was induced by addition of 1mM isopropyl β-thiogalactoside (IPTG), followed by a further incubationat 30° C. for 5 hours. Then the cells were harvested by centrifugationat 2,500×g, flash frozen in liquid nitrogen and stored at −80° C.

[0085] Protein Purification by Ni-NTA Column—Bacterial cell pellets wereresuspended in Buffer 1(50 nM Tris Cl, 150 mM Na Cl, 0.1% NaN3, 10 mMCHAPS, 20% glycerol, 2 mM PMSF, 1.4 mM β-ME, pH 8.0) followed bysonication three times for 20 sec, on ice. The protein was purified frominclusion bodies as described above for the cI-EP₃-his protein.

[0086] Expression of cI-EP₂-HTL fusion protein—The vector pCK2-5-HTL hasthe tac promoter and the cI translation initiator sequences. The cDNAencoding the human EP₂ receptor was modified as described above for theEP₃ receptor to obtain an EP₂ receptor fused to a portion of the cI gene(1-76 aa) at the N-terminus and a C-terminal fusion to a 6× his tag,followed by the thrombin cleavage site fused to amino acids 82-236 ofλ-repressor. The resultant fusion protein of 610 aa has a predictedmolecular weight 67.5 kDa. When expressed in E. coli this constructdemonstrated high levels of expression of a protein of apparentmolecular weight of 60 kDa in whole cell lysates when resolved bySDS-PAGE. The expression levels of the cI-EP₂-HTL fusion protein wereestimated to be in the range of 20 mg/L of E. coli culture. Western blotanalysis of the lysates demonstrated that the induced protein of theappropriate molecular weight was reactive with the 24H antibody for EP₂constructs tested.

[0087] Analysis of fusion of the EP₂ receptor to variable length cIleader sequences. To test the applicability of different bacterialpromoters, while simultaneously assessing the limit to which theN-terminal sequence may be truncated, the human EP₂ receptor was fusedto the bacteriophage T7 promoter with N-terminal fusion sequences of 0,15, 36 and 76 aa of the N-terminus of cI. Induction of each of thesefusion proteins was achieved by the addition of IPTG and the ability ofeach construct to express protein was assessed by Western blot.Initially, the N-terminal 76 aa of the cI gene were fused to the EP₂receptor at an EcoRI restriction site encoding Glu-Phe at amino acids 75and 76. The N-terminal sequence of 76 amino acids contains efficienttranslation initiation sequences and has 12 positively charged residues.To determine the minimum sequence requirement for efficient synthesis ofrecombinant protein from the cI fusion system, deletion analysis of thecI fusion sequence was performed. Deletions were performed by PCRmutagenesis employing an upstream primer which overlapped the initiatorATG and introduced an NdeI cloning site. The downstream primeroverlapped the terminal sequence and introduced an EcoRI cloning site.These shortened N-terminal fusion sequences, consisting of amino acids1-76 (SEQ ID NO:1; with 12 positively charged residues), amino acids1-36 (SEQ ID NO:12; with 9 positively charged residues), amino acids1-22 (SEQ ID NO:13; with 6 positively charged residues), amino acids1-15 (SEQ ID NO:14; with 3 positively charged residues) and no leader(as control) were fused to the N-terminus of the EP₂ receptor and theresultant constructs were expressed in E. coli. Protein expression wasmonitored by Western blot analysis, utilizing an anti-EP₂ sheeppolyclonal antibody, as well as the Ni-NTA-HRP conjugate reactive withthe 6×-his C-terminal fusion (FIG. 1). Because the 24H epitope lieswithin the first 36 aa of cI, this monoclonal anti -cI antibody was notuseful for these experiments. In addition, these constructs wereexpressed in the pT7 “pET” vectors (Novagen, Madison, Wis. to comparethe efficacy of the T7 promoter versus the previously described ptacpromoter. The T7 promoter has the advantage of being less “leaky” thantac, i.e., it has a lower basal level of transcription. The results ofthese comparative studies demonstrated that the ptac promoter wassuperior to the T7 promoter in yielding high steady state levels ofprotein. Overall, the greatest amount of expression was observed withthe cI¹⁻⁷⁶ construct, no expression was observed with the cI⁰ (no leadersequence) construct and intermediate amounts of expression were observedwith the cI¹⁻¹⁵, cI¹⁻²² and cI¹⁻³⁶ constructs.

Example 3 Overexpression of Non EP Receptor Proteins

[0088] The ability of the ci fusion system of this invention to producea variety of proteins was assessed as follows: the pLJM5-22H plasmidencoding the fusion of the PGEP₂EP₃ receptor with the N-terminal domainof λ repressor was modified to remove the cDNA encoding the PGEP₂EP₃receptor and DNA sequences encoding alternative target proteins wereinserted as fusion proteins in-frame with and downstream from the tacpromoter/cI fusion at amino acid 76 of the λ repressor protein. As ageneral plasmid construction strategy, the target protein was amplifiedby polymerase chain reaction using an upstream oligonucleotide whichoverlapped the N-terminal sequence of the target protein and introducedan NdeI site immediately upstream from the ATG start codon. Thedownstream oligonucleotide overlapped the final codons of the targetprotein (in general, six codons), removed the stop codon and introducedan XhoI site. This allowed introduction of the target sequence at theNdeI and XhoI restriction sites which flanked the PGEP₂EP₃ receptor inplasmid pLJM5-22H. The resulting constructs were fused at the N-terminusto the first 76 aa of λ repressor and at the C-terminus to the 6×-histag.

[0089] Plasmid pSD1.63 his-fusion of the human CCR5 receptor with theN-terminal domain of A repressor. Construct pSD1.63 his (SEQ ID NO:3)was assembled according to the general outline as described above. Theplasmid allowed expression of the nucleotide sequence encoding the CCR5protein, a chemokine receptor of the GPCR superfamily and theco-receptor for HIV entry into host cells.

[0090] The upstream oligonucleotide primer had the sequence: 5′ GCGC CATATG GAT TAT AAG TGT CAA GTC CAA 3′ (SEQ ID NO:29). The downstream primerhad the sequence: 5′ GCCG CT CGA GGC CAA GCC CAC AGA TAT TTC CT 3′ (SEQID NO:30).

[0091] The underlined sequence delineates the NdeI sequence in the upperprimer and the XhoI sequence in the lower primer, respectively. The twoprimers were used to amplify the CCR5 receptor from human genomic DNAutilizing the following reaction conditions: 35 cycles of the followingprotocol: 1 min at 95° C., 15 sec at 95° C., 15 sec at 68° C., followedby a 10 min final extension at 72° C. The resultant amplified product of1074 nt was digested with NdeI and XhoI and inserted into the expressionvector.

[0092] The fusion protein was produced and purified as described for thePGEP₂EP₃ receptor, with modifications for optimal growth, inductiontimes and temperatures. In brief, the culture conditions were asfollows: Medium was inoculated with a 1:100 dilution of a freshovernight culture and grown overnight for 24 hours at 30° C. in 2×LBmedium containing 2% glucose and the appropriate antibiotics. Cells werecollected by centrifugation and resuspended in 2×LB (no glucose)containing 1 mM IPTG and the appropriate antibiotics. Cells were thengrown for varying times of induction at 30° C. and harvested bycentrifugation. For the CCR-5 fusion protein, induction was overnightfor 24 hours.

[0093] Plasmid pSD1.18 his-fusion of the human β₂AR receptor with theN-terminal domain of λ repressor. Construct pSD1.18 his (SEQ ID NO:4)was assembled using a modification of the general outline describedabove. The natural stop codon of the semi-synthetic β₂AR was removed byPCR, however the 5′ end of the cDNA was subcloned into the EcoRI site atcodon 75/76 of λ repressor, which is immediately upstream of theflanking NdeI site utilized in the previous constructs.

[0094] The sense oligonucleotide contains sequence internal to the β₂ARand adds an Nco I and EcoRI site at the 5′ end: 5′ GCGCGAATTCACCATG GAAATG AGA CCT GCT GTG ACT TC 3′ (SEQ ID NO:31).

[0095] The mutagenic antisense oligonucleotide primer which removes thestop codon and introduces the XhoI site for fusion to the 6× his tag hadthe sequence: 5′CCGGG CT CGA GGC TAG CAGTGA GTC ATT TGT ACT ACA AT 3′(SEQ ID NO:32).

[0096] The underlined sequence delineates the EcoRI and NcoI restrictionsequences in the upper sense primer and the XhoI sequence in the lowerantisense primer. These oligonucleotides were used to amplify a smallinternal β₂AR sequence with convenient EcoRI and XhoI restriction sitesat the termini. This fragment was subcloned into the tac/cI fusionexpression vector as an intermediate step. The resulting plasmid wasdigested with at NcoI and an (internal to β₂AR) EcoRV site containedwithin the original PCR fragment. The corresponding NcoI-EcoRV fragmentof the β₂AR cDNA was subcloned into this site, reconstituting the filllength modified β₂AR sequence. This resulted in the cI¹⁻⁷⁶-β₂AR-6× hisfusion protein cloned into the cI-fusion expression vector. The fusionprotein was produced and purified as described herein for the EP₃receptor with an overnight 24 hour induction.

[0097] Plasmid pSD 1.134 his-fusion of the ROMK receptor with theN-terminal domain of λ repressor. Construct pSD1.134 his (SEQ ID NO:5)was assembled according to the general outline as described herein. Theupstream oligonucleotide primer had the sequence: 5′ GGGAATTC CAT ATGTTC AAA CAC CTC CGA AGA TGG 3′ (SEQ ID NO:33). The downstream primer hadthe sequence: 5′ CCGCTCGAGGC CAT CTG GGT GTC GTC CGT TTCA TC 3′ (SEQ IDNO:34). The underlined sequence delineates EcoRI and NdeI sequences inthe upper primer and the XhoI sequence in the lower primer,respectively. The two primers were used to amplify the ROMK channel fromthe cloned rat cDNA utilizing PCR. The resultant amplified product wasdigested with NdeI and XhoI and the 1.1 kb fragment was inserted intothe expression vector. The resultant cI¹⁻⁷⁶ fusion protein was under thecontrol of the tac promoter. The fusion protein was produced andpurified as described herein for the PGEP₂EP₃ receptor, with a three tofive hour induction. The resulting yield of ROMK protein by this methodwas approximately 1 mg of purified protein/liter of bacterial culturewhen induced at A₆₀₀=1 to 2.

[0098] Plasmid pSD2.46his-fusion of the human rho protein with theN-terminal domain of λ repressor. Construct pSD2.46his (SEQ ID NO:6) wasassembled according to the general outline described herein. Theresulting construct allowed expression of the nucleotide sequenceencoding a fusion protein of cI¹⁻⁷⁶ and the human rho protein, which isa cytoplasmic protein which is membrane-associated and a member of thefamily of small G-proteins. The upstream oligonucleotide primer had thesequence: 5′ GCGCGC ATATGGCTGCCATCCGGAAG3′ (SEQ ID NO:35). Thedownstream primer had the sequence: 5′ GCC GCT CGA GGC CAA GAC AAG GCAACC AGA3′ (SEQ ID NO:36). The underlined sequence delineates an NdeIsequence in the upper primer and an XhoI sequence in the lower primer,respectively. The two primers were used to amplify rho from cloned humancDNA utilizing PCR. The resultant amplified product was digested withNdeI and XhoI and the 0.6 kb fragment was inserted into the expressionvector. The resultant cI¹⁻⁷⁶ fusion protein was under the control of thetac promoter. The fusion protein was produced and purified as describedherein for the PGEP₂EP₃ receptor, with a three to 24 hour induction.

[0099] Although the present process has been described with reference tospecific details of certain embodiments thereof, it is not intended thatsuch details should be regarded as limitations upon the scope of theinvention except as and to the extent that they are included in theaccompanying claims.

[0100] Throughout this application, various publications are referenced.The disclosures of these publications in their entireties are herebyincorporated by reference into this application in order to more fullydescribe the state of the art to which this invention pertains.

REFERENCES

[0101] Audoly, L. and R. M. Breyer. Substitution of charged amino acidresidues in transmembrane regions 6 and 7 affect ligand binding andsignal transduction of the prostaglandin EP₃ receptor. Mol. Pharmacol.51:61-68 (1997).

[0102] Braiman et al. J. Biol. Chem. 262:9271-9276 (1987).

[0103] Breyer, R. M., R. B. Emerson, J. L. Tarng, M. D. Breyer, L. S.Davis, R. M. Abromson and S. M. Ferrenbach. Alternative splicinggenerates multiple isoforms of a rabbit prostaglandin E₂ receptor. J.Biol. Chem. 269(8):6163-6169 (1994).

[0104] Breyer, R. M. and R. T. Sauer. Mutational Analysis of the FineSpecificity of Binding of Monoclonal Antibody 51F to λ Repressor. J.Biol. Chem. 264(Aug. 5):13355-13360 (1989a).

[0105] Breyer, R. M. and R. T. Sauer. Production and Characterization ofMonoclonal Antibodies to the N-terminal Domain of λ Repressor. J. Biol.Chem. 264(Aug. 5):13348-13354 (1989b).

[0106] Breyer, R. M., A. D. Strosberg and J.-G. Guillet. MutationalAnalysis of Ligand Binding of β₂ Adrenergic Receptor Expressed inEscherichia coli. EMBO J. 9(9):2679-2684 (1990).

[0107] Chapot, M. P., Y. Eshdat, S. Marullo, J.-G. Guillet, A. Charbit,A. D. Strosberg and C. Delavier-Klutchko. Localization andcharacterization of three different β adrenergic receptors expressed inEscherichia coli. Eur. J. Biochem. 187:137-144 (1990).

[0108] Goeddel, D. Z. Methods in Enzymology 185:3-7 (1990).

[0109] Harlow and Lane. Antibodies: A Laboratory Manual Cold SpringHarbor Laboratory, Cold Spring Harbor, N.Y. (1988).

[0110] Le Caherec, F., et al. J. Cell Science 109:1285-1295 (1996).

[0111] Lewin. Genes V. Oxford University Press. Chapter 7, pp. 171-174(1994).

[0112] Limbird, L. E. Cell surface receptors: A short course on theoryand methods. Second edition. Kluwer Academic Publishers, Norwell, Mass.(1996).

[0113] Loisel, T. P., H. Ansanay, S. St-Onge, B. Gay, P. Boulanger, A.D. Strosberg, S. Marullo and M. Bouvier. Recovery of homogeneous andfunctional β₂ adrenergic receptors from extracellular baculovirusparticles. Nature Biotechnology 15:1300-1304 (1997).

[0114] Marullo, S., C. Delavier-Klutchko, Y. Eshdat, A. D. Strosberg andL. J. Emorine. Human β₂ adrenergic receptors expressed in E. colimembranes retain their pharmacological properties. Proc. Natl. Acad.Sci. U.S.A. 85(October 1988):7551-7555 (1988).

[0115] Michieli, et al. Oncogene 12:775-784 (1996).

[0116] Sambrook et al. Molecular Cloning: A Laboratory Manual. 2d Ed.Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 1988.

[0117] Sander, P., S. Grünewald, Reilander and H. Michel. Expression ofthe human D2s dopamine receptor in yeasts Saccharomyces cerevisae andSchizosaccharomyces pombe: a comparative study. FEBS Letters 344:41-46(1994).

[0118] Sauer, R. T. Biochemistry 17:1092-1100 (1978).

[0119] Sauer et al. Nature 279:396-400 (1979).

[0120] Zeidel, M. L., et al. Biochemistry 31:7436-7441 (1992).

[0121] Zweichal, A., et al. Am. J. Physiol. 261:F187-F196 (1991).

[0122]

1 37 1 5856 DNA Artificial Sequence CDS (300)...(1799) Description ofArtificial Sequence/Note = synthetic construct 1 ttctcatgtt tgacagcttatctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60 gcagccatcg gaagctgtggtatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120 gctcaaggcg cactcccgttctggataatg ttttttgcgc cgacatcata acggttctgg 180 caaatattct gaaatgagctgttgacaatt aatcatcggc tcgtataatg tggaattgtg 240 agcggataac aattaatgtgtgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atg agc aca aaa aag aaacca tta aca caa gag cag ctt gag gac gca 347 Met Ser Thr Lys Lys Lys ProLeu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgt cgc ctt aaa gca atttat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg Arg Leu Lys Ala Ile TyrGlu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc cag gaa tct gtc gca gacaag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln Glu Ser Val Ala Asp LysMet Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct tta ttt aat ggc atc aatgca tta aat gct tat aac gcg gca 491 Gly Ala Leu Phe Asn Gly Ile Asn AlaLeu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaa att ctc aaa gtt agcgtt gaa gaa ttc gca gct cat 539 Leu Leu Ala Lys Ile Leu Lys Val Ser ValGlu Glu Phe Ala Ala His 65 70 75 80 atg aag gag acg cgg ggc gac gga gggagc gcc ccc ttc tgc acc cgc 587 Met Lys Glu Thr Arg Gly Asp Gly Gly SerAla Pro Phe Cys Thr Arg 85 90 95 ctc aac cac tcg tat cca ggc atg tgg gcgccc gag gca cgg ggc aac 635 Leu Asn His Ser Tyr Pro Gly Met Trp Ala ProGlu Ala Arg Gly Asn 100 105 110 ctc aca cgc ccc cca ggg ccc ggc gag gactgt ggc tcg gtg tcc gtg 683 Leu Thr Arg Pro Pro Gly Pro Gly Glu Asp CysGly Ser Val Ser Val 115 120 125 gcc ttc ccg atc acc atg ctg atc acc ggcttc gtg ggc aac gcg ctg 731 Ala Phe Pro Ile Thr Met Leu Ile Thr Gly PheVal Gly Asn Ala Leu 130 135 140 gcc atg ctg ctc gtg tcg cgt agc tac cggcgt cgg gag agc aag cgc 779 Ala Met Leu Leu Val Ser Arg Ser Tyr Arg ArgArg Glu Ser Lys Arg 145 150 155 160 aag aag tcg ttc ctg ttg tgc atc ggctgg ctg gcg ctc act gac ctg 827 Lys Lys Ser Phe Leu Leu Cys Ile Gly TrpLeu Ala Leu Thr Asp Leu 165 170 175 gtc ggg cag ctg ctc aca agc ccc gtggtc atc ttg gtg tac cta tcc 875 Val Gly Gln Leu Leu Thr Ser Pro Val ValIle Leu Val Tyr Leu Ser 180 185 190 aag cag cgc tgg gag cag ctc gac ccgtcg ggg cgc ctg tgc acc ttc 923 Lys Gln Arg Trp Glu Gln Leu Asp Pro SerGly Arg Leu Cys Thr Phe 195 200 205 ttt ggt ctg acc atg act gtt ttc gggctg tcc tcg ctc ttc atc gcc 971 Phe Gly Leu Thr Met Thr Val Phe Gly LeuSer Ser Leu Phe Ile Ala 210 215 220 agc gcc atg gct gtc gag agg gcg ctggcc atc cgt gcg cca cac tgg 1019 Ser Ala Met Ala Val Glu Arg Ala Leu AlaIle Arg Ala Pro His Trp 225 230 235 240 tac gcg agc cac atg aag acg cgtgcc act cgc gcc gtc ctg ctg ggc 1067 Tyr Ala Ser His Met Lys Thr Arg AlaThr Arg Ala Val Leu Leu Gly 245 250 255 gtg tgg ctg gca gtg ctc gcc ttcgcc ctg cta cct gtg ctg ggt gtg 1115 Val Trp Leu Ala Val Leu Ala Phe AlaLeu Leu Pro Val Leu Gly Val 260 265 270 ggt cag tac acc atc cag tgg cccggg acg tgg tgc ttc atc agc acc 1163 Gly Gln Tyr Thr Ile Gln Trp Pro GlyThr Trp Cys Phe Ile Ser Thr 275 280 285 gga cga ggg gac aac ggg acg agctct tca cac aac tgg ggc aac ctt 1211 Gly Arg Gly Asp Asn Gly Thr Ser SerSer His Asn Trp Gly Asn Leu 290 295 300 ttc ttc gcc tcc acc ttt gcc ttcctg ggc ctc ttg gcg ctg gcc atc 1259 Phe Phe Ala Ser Thr Phe Ala Phe LeuGly Leu Leu Ala Leu Ala Ile 305 310 315 320 acc ttc acc tgc aac ctg gccacc att aag gct ctg gtg tcc cgc tgc 1307 Thr Phe Thr Cys Asn Leu Ala ThrIle Lys Ala Leu Val Ser Arg Cys 325 330 335 cgg gca aag gcg gca gca tcacag tcc agt gcc cag tgg ggc cgg atc 1355 Arg Ala Lys Ala Ala Ala Ser GlnSer Ser Ala Gln Trp Gly Arg Ile 340 345 350 acg acc gag acg gcc atc cagctc atg ggg atc atg tgc gtg ctg tcg 1403 Thr Thr Glu Thr Ala Ile Gln LeuMet Gly Ile Met Cys Val Leu Ser 355 360 365 gtc tgc tgg tcg ccc cta ctgata atg atg ttg aaa atg atc ttc aat 1451 Val Cys Trp Ser Pro Leu Leu IleMet Met Leu Lys Met Ile Phe Asn 370 375 380 cag aca tca gtt gag cac tgcaag aca gac aca gga aag cag aaa gaa 1499 Gln Thr Ser Val Glu His Cys LysThr Asp Thr Gly Lys Gln Lys Glu 385 390 395 400 tgc aac ttc ttc tta atagct gtt cgc ctg gct tca ctg aac cag ata 1547 Cys Asn Phe Phe Leu Ile AlaVal Arg Leu Ala Ser Leu Asn Gln Ile 405 410 415 ttg gat ccc tgg gtt tatctg ctg cta aga aag att ctt ctt cgg aag 1595 Leu Asp Pro Trp Val Tyr LeuLeu Leu Arg Lys Ile Leu Leu Arg Lys 420 425 430 ttt tgc cag gta att catgaa aat aat gag cag aag gat gaa att cag 1643 Phe Cys Gln Val Ile His GluAsn Asn Glu Gln Lys Asp Glu Ile Gln 435 440 445 cgt gag aac agg aac gtctca cac agt ggg caa cac gaa gag gcc aga 1691 Arg Glu Asn Arg Asn Val SerHis Ser Gly Gln His Glu Glu Ala Arg 450 455 460 gac agt gag aag agc aaaacc atc cct ggc ctg ttc tcc att ctg ctg 1739 Asp Ser Glu Lys Ser Lys ThrIle Pro Gly Leu Phe Ser Ile Leu Leu 465 470 475 480 cag gct gac cct ggtgct cgt cct tat cag caa gcc tcg agg cac cat 1787 Gln Ala Asp Pro Gly AlaArg Pro Tyr Gln Gln Ala Ser Arg His His 485 490 495 cac cac cac cactgaagcttta atgcggtagt ttatcacagt taaattgcta 1839 His His His His 500acgcagtcag gcaccgtgta tgaaatctaa caatgcgctc atcgtcatcc tcggcaccgt 1899caccctggat gctgtaggca taggcttggt tatgccggta ctgccgggcc tcttgcggga 1959tcgacgcgag gctggatggc cttccccatt atgattcttc tcgcttccgg cggcatcggg 2019atgcccgcgt tgcaggccat gctgtccagg caggtagatg acgaccatca gggacagctt 2079caaggatcgc tcgcggctct taccagccta acttcgatca ctggaccgct gatcgtcacg 2139gcgatttatg ccgcctcggc gagcacatgg aacgggttgg catggattgt aggcgccgcc 2199ctataccttg tctgcctccc cgcgttgcgt cgcggtgcat ggagccgggc cacctcgacc 2259tgaatggaag ccggcggcac ctcgctaacg gattcaccac tccaagaatt ggagccaatc 2319aattcttgcg gagaactgtg aatgcgcaaa ccaacccttg gcagaacata tccatcgcgt 2379ccgccatctc cagcagccgc acgcggcgca tctcgggcag cgttgggtcc tggccacggg 2439tgcgcatgat cgtgctcctg tcgttgagga cccggctagg ctggcggggt tgccttactg 2499gttagcagaa tgaatcaccg atacgcgagc gaacgtgaag cgactgctgc tgcaaaacgt 2559ctgcgacctg agcaacaaca tgaatggtct tcggtttccg tgtttcgtaa agtctggaaa 2619cgcggaagtc agcgccctgc accattatgt tccggatctg catcgcagga tgctgctggc 2679taccctgtgg aacacctaca tctgtattaa cgaagcgctg gcattgaccc tgagtgattt 2739ttctctggtc ccgccgcatc cataccgcca gttgtttacc ctcacaacgt tccagtaacc 2799gggcatgttc atcatcagta acccgtatcg tgagcatcct ctctcgtttc atcggtatca 2859ttacccccat gaacagaaat tcccccttac acggaggcat caagtgacca aacaggaaaa 2919aaccgccctt aacatggccc gctttatcag aagccagaca ttaacgcttc tggagaaact 2979caacgagctg gacgcggatg aacaggcaga catctgtgaa tcgcttcacg accacgctga 3039tgagctttac cgcagctgcc tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat 3099gcagctcccg gagacggtca cagcttgtct gtaagcggat gccgggagca gacaagcccg 3159tcagggcgcg tcagcgggtg ttggcgggtg tcggggcgca gccatgaccc agtcacgtag 3219cgatagcgga gtgtatactg gcttaactat gcggcatcag agcagattgt actgagagtg 3279caccatatgc ggtgtgaaat accgcacaga tgcgtaagga gaaaataccg catcaggcgc 3339tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta 3399tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag 3459aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg 3519tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 3579tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 3639cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 3699agcgtggcgc tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 3759tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 3819aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 3879ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 3939cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt 3999accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 4059ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 4119ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 4179gtcatgagat tatcaaaaag gatcttcacc tagatccttt taccccggtt gataatcaga 4239aaagccccaa aaacaggaag attgtataag caaatattta aattgtaaac gttaatattt 4299tgttaaaatt cgcgttaaat ttttgttaaa tcagctcatt ttttaaccaa taggccgaaa 4359tcggcaaaat cccttataaa tcaaaagaat agcccgagat agggttgagt gttgttccag 4419tttggaacaa gagtccacta ttaaagaacg tggactccaa cgtcaaaggg cgaaaaaccg 4479tctatcaggg cgatggccca ctacgtgaac catcacccaa atcaagtttt ttggggtcga 4539ggtgccgtaa agcactaaat cggaacccta aagggagccc ccgatttaga gcttgacggg 4599gaaagccggc gaacgtggcg agaaaggaag ggaagaaagc gaaaggagcg ggcgctaggg 4659cgctggcaag tgtagcggtc acgctgcgcg taaccaccac acccgccgcg cttaatgcgc 4719cgctacaggg cgcgtaaatc aatctaaagt atatatgagt aaacttggtc tgacagttac 4779caatgcttaa tcagtgaggc acctatctca gcgatctgtc tatttcgttc atccatagtt 4839gcctgactcc ccgtcgtgta gataactacg atacgggagg gcttaccatc tggccccagt 4899gctgcaatga taccgcgaga cccacgctca ccggctccag atttatcagc aataaaccag 4959ccagccggaa gggccgagcg cagaagtggt cctgcaactt tatccgcctc catccagtct 5019attaattgtt gccgggaagc tagagtaagt agttcgccag ttaatagttt gcgcaacgtt 5079gttgccattg ctgcaggcat cgtggtgtca cgctcgtcgt ttggtatggc ttcattcagc 5139tccggttccc aacgatcaag gcgagttaca tgatccccca tgttgtgcaa aaaagcggtt 5199agctccttcg gtcctccgat cgttgtcaga agtaagttgg ccgcagtgtt atcactcatg 5259gttatggcag cactgcataa ttctcttact gtcatgccat ccgtaagatg cttttctgtg 5319actggtgagt actcaaccaa gtcattctga gaatagtgta tgcggcgacc gagttgctct 5379tgcccggcgt caacacggga taataccgcg ccacatagca gaactttaaa agtgctcatc 5439attggaaaac gttcttcggg gcgaaaactc tcaaggatct taccgctgtt gagatccagt 5499tcgatgtaac ccactcgtgc acccaactga tcttcagcat cttttacttt caccagcgtt 5559tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa agggaataag ggcgacacgg 5619aaatgttgaa tactcatact cttccttttt caatattatt gaagcattta tcagggttat 5679tgtctcatga gcggatacat atttgaatgt atttagaaaa ataaacaaat aggggttccg 5739cgcacatttc cccgaaaagt gccacctgac gtctaagaaa ccattattat catgacatta 5799acctataaaa ataggcgtat cacgaggccc tttcgtcttc aagaattgat cgatcaa 5856 26446 DNA Artificial Sequence CDS (300)...(2126) Description ofArtificial Sequence/note = synthetic construct 2 ttctcatgtt tgacagcttatctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60 gcagccatcg gaagctgtggtatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120 gctcaaggcg cactcccgttctggataatg ttttttgcgc cgacatcata acggttctgg 180 caaatattct gaaatgagctgttgacaatt aatcatcggc tcgtataatg tggaattgtg 240 agcggataac aattaatgtgtgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atg agc aca aaa aag aaacca tta aca caa gag cag ctt gag gac gca 347 Met Ser Thr Lys Lys Lys ProLeu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgt cgc ctt aaa gca atttat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg Arg Leu Lys Ala Ile TyrGlu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc cag gaa tct gtc gca gacaag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln Glu Ser Val Ala Asp LysMet Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct tta ttt aat ggc atc aatgca tta aat gct tat aac gcg gca 491 Gly Ala Leu Phe Asn Gly Ile Asn AlaLeu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaa att ctc aaa gtt agcgtt gaa gaa ttc cat atg ggc 539 Leu Leu Ala Lys Ile Leu Lys Val Ser ValGlu Glu Phe His Met Gly 65 70 75 80 aat gcc tcc aat gac tcc cag tct gaggac tgc gag acg cga cag tgg 587 Asn Ala Ser Asn Asp Ser Gln Ser Glu AspCys Glu Thr Arg Gln Trp 85 90 95 ctt ccc cca ggc gaa agc cca gcc atc agctcc gtc atg ttc tcg gcc 635 Leu Pro Pro Gly Glu Ser Pro Ala Ile Ser SerVal Met Phe Ser Ala 100 105 110 ggg gtg ctg ggg aac ctc ata gca ctg gcgctg ctg gcg cgc cgc tgg 683 Gly Val Leu Gly Asn Leu Ile Ala Leu Ala LeuLeu Ala Arg Arg Trp 115 120 125 cgg ggg gac gtg ggg tgc agc gcc ggc cgcagg agc tcc ctc tcc ttg 731 Arg Gly Asp Val Gly Cys Ser Ala Gly Arg ArgSer Ser Leu Ser Leu 130 135 140 ttc cac gtg ctg gtg acc gag ctg gtg ttcacc gac ctg ctc ggg acc 779 Phe His Val Leu Val Thr Glu Leu Val Phe ThrAsp Leu Leu Gly Thr 145 150 155 160 tgc ctc atc agc cca gtg gta ctg gcttcg tac gcg cgg aac cag acc 827 Cys Leu Ile Ser Pro Val Val Leu Ala SerTyr Ala Arg Asn Gln Thr 165 170 175 ctg gtg gca ctg gcg ccc gag agc cgcgcg tgc acc tac ttc gct ttc 875 Leu Val Ala Leu Ala Pro Glu Ser Arg AlaCys Thr Tyr Phe Ala Phe 180 185 190 gcc atg acc ttc ttc agc ctg gcc acgatg ctc atg ctc ttc gcc atg 923 Ala Met Thr Phe Phe Ser Leu Ala Thr MetLeu Met Leu Phe Ala Met 195 200 205 gcc ctg gag cgc tac ctc tcg atc gggcac ccc tac ttc tac cag cgc 971 Ala Leu Glu Arg Tyr Leu Ser Ile Gly HisPro Tyr Phe Tyr Gln Arg 210 215 220 cgc gtc tcg gcc tcc ggg ggc ctg gccgtg ctg cct gtc atc tat gca 1019 Arg Val Ser Ala Ser Gly Gly Leu Ala ValLeu Pro Val Ile Tyr Ala 225 230 235 240 gtc tcc ctg ctc ttc tgc tcg ctgccg ctg ctg gac tat ggg cag tac 1067 Val Ser Leu Leu Phe Cys Ser Leu ProLeu Leu Asp Tyr Gly Gln Tyr 245 250 255 gtc cag tac tgc ccc ggg acc tggtgc ttc atc cgg cac ggg cgg acc 1115 Val Gln Tyr Cys Pro Gly Thr Trp CysPhe Ile Arg His Gly Arg Thr 260 265 270 gct tac ctg cag ctg tac gcc accctg ctg ctg ctt ctc att gtc tcg 1163 Ala Tyr Leu Gln Leu Tyr Ala Thr LeuLeu Leu Leu Leu Ile Val Ser 275 280 285 gtg ctc gcc tgc aac ttc agt gtcatt ctc aac ctc atc cgc atg cac 1211 Val Leu Ala Cys Asn Phe Ser Val IleLeu Asn Leu Ile Arg Met His 290 295 300 cgc cga agc cgg aga agc cgc tgcgga cct tcc ctg ggc agt ggc cgg 1259 Arg Arg Ser Arg Arg Ser Arg Cys GlyPro Ser Leu Gly Ser Gly Arg 305 310 315 320 ggc ggc ccc ggg gcc cgc aggaga ggg gaa agg gtg tcc atg gcg gag 1307 Gly Gly Pro Gly Ala Arg Arg ArgGly Glu Arg Val Ser Met Ala Glu 325 330 335 gag acg gac cac ctc att ctcctg gct atc atg acc atc acc ttc gcc 1355 Glu Thr Asp His Leu Ile Leu LeuAla Ile Met Thr Ile Thr Phe Ala 340 345 350 gtc tgc tcc ttg cct ttc acgatt ttt gca tat atg aat gaa acc tct 1403 Val Cys Ser Leu Pro Phe Thr IlePhe Ala Tyr Met Asn Glu Thr Ser 355 360 365 tcc cga aag gaa aaa tgg gacctc caa gct ctt agg ttt tta tca att 1451 Ser Arg Lys Glu Lys Trp Asp LeuGln Ala Leu Arg Phe Leu Ser Ile 370 375 380 aat tca ata att gac cct tgggtc ttt gcc atc ctt agg cct cct gtt 1499 Asn Ser Ile Ile Asp Pro Trp ValPhe Ala Ile Leu Arg Pro Pro Val 385 390 395 400 ctg aga cta atg cgt tcagtc ctc tgt tgt cgg att tca tta aga aca 1547 Leu Arg Leu Met Arg Ser ValLeu Cys Cys Arg Ile Ser Leu Arg Thr 405 410 415 caa gat gca aca caa acttcc tgt tct aca cag tca gat gcc agt aaa 1595 Gln Asp Ala Thr Gln Thr SerCys Ser Thr Gln Ser Asp Ala Ser Lys 420 425 430 cag gct gac ctt gcc tcgagc cac cac cac cac cac cac tct aga ctg 1643 Gln Ala Asp Leu Ala Ser SerHis His His His His His Ser Arg Leu 435 440 445 gtg cca cgc gga tcc gttcga gaa atc tac gag atg tat gaa gcg gtt 1691 Val Pro Arg Gly Ser Val ArgGlu Ile Tyr Glu Met Tyr Glu Ala Val 450 455 460 agc atg cag ccg tca cttaga agt gag tat gag tac cct gtt ttt tct 1739 Ser Met Gln Pro Ser Leu ArgSer Glu Tyr Glu Tyr Pro Val Phe Ser 465 470 475 480 cat gtt cag gca gggatg ttc tca cct aag ctt aga acc ttt acc aaa 1787 His Val Gln Ala Gly MetPhe Ser Pro Lys Leu Arg Thr Phe Thr Lys 485 490 495 ggt gat gcg gag agatgg gta agc aca acc aaa aaa gcc agt gat tct 1835 Gly Asp Ala Glu Arg TrpVal Ser Thr Thr Lys Lys Ala Ser Asp Ser 500 505 510 gca ttc tgg ctt gaggtt gaa ggt aat tcc atg acc gca cca aca ggc 1883 Ala Phe Trp Leu Glu ValGlu Gly Asn Ser Met Thr Ala Pro Thr Gly 515 520 525 tcc aag cca agc tttcct gac gga atg tta att ctc gtt gac cct gag 1931 Ser Lys Pro Ser Phe ProAsp Gly Met Leu Ile Leu Val Asp Pro Glu 530 535 540 cag gct gtt gag ccaggt gat ttc tgc ata gcc aga ctt ggg ggt gat 1979 Gln Ala Val Glu Pro GlyAsp Phe Cys Ile Ala Arg Leu Gly Gly Asp 545 550 555 560 gag ttt acc ttcaag aaa ctg atc agg gat agc ggt cag gtg ttt tta 2027 Glu Phe Thr Phe LysLys Leu Ile Arg Asp Ser Gly Gln Val Phe Leu 565 570 575 caa cca cta aaccca cag tac cca atg atc cca tgc aat gag agt tgt 2075 Gln Pro Leu Asn ProGln Tyr Pro Met Ile Pro Cys Asn Glu Ser Cys 580 585 590 tcc gtt gtg gggaaa gtt atc gct agt cag tgg cct gaa gag acg ttt 2123 Ser Val Val Gly LysVal Ile Ala Ser Gln Trp Pro Glu Glu Thr Phe 595 600 605 ggc tgatcggcaaggtgttctgg tcggcgcata gctgataaca attgagcaag 2176 Gly aatcttcatcgaattagggg aattttcact cccctcagaa cataacatag taaatggatt 2236 gaattatgaagaatggtttt tatgcgactt accgcagcaa aaataaaggg aaagataagc 2296 gctcaataaacctgtctgtt ttccttaatt ctctgctggc tgataatcat cacctgcagg 2356 ttggctccaattatttgtat attcataaaa tcgataagct ttaatgcggt agtttatcac 2416 agttaaattgctaacgcagt caggcaccgt gtatgaaatc taacaatgcg ctcatcgtca 2476 tcctcggcaccgtcaccctg gatgctgtag gcataggctt ggttatgccg gtactgccgg 2536 gcctcttgcgggatcgacgc gaggctggat ggccttcccc attatgattc ttctcgcttc 2596 cggcggcatcgggatgcccg cgttgcaggc catgctgtcc aggcaggtag atgacgacca 2656 tcagggacagcttcaaggat cgctcgcggc tcttaccagc ctaacttcga tcactggacc 2716 gctgatcgtcacggcgattt atgccgcctc ggcgagcaca tggaacgggt tggcatggat 2776 tgtaggcgccgccctatacc ttgtctgcct ccccgcgttg cgtcgcggtg catggagccg 2836 ggccacctcgacctgaatgg aagccggcgg cacctcgcta acggattcac cactccaaga 2896 attggagccaatcaattctt gcggagaact gtgaatgcgc aaaccaaccc ttggcagaac 2956 atatccatcgcgtccgccat ctccagcagc cgcacgcggc gcatctcggg cagcgttggg 3016 tcctggccacgggtgcgcat gatcgtgctc ctgtcgttga ggacccggct aggctggcgg 3076 ggttgccttactggttagca gaatgaatca ccgatacgcg agcgaacgtg aagcgactgc 3136 tgctgcaaaacgtctgcgac ctgagcaaca acatgaatgg tcttcggttt ccgtgtttcg 3196 taaagtctggaaacgcggaa gtcagcgccc tgcaccatta tgttccggat ctgcatcgca 3256 ggatgctgctggctaccctg tggaacacct acatctgtat taacgaagcg ctggcattga 3316 ccctgagtgatttttctctg gtcccgccgc atccataccg ccagttgttt accctcacaa 3376 cgttccagtaaccgggcatg ttcatcatca gtaacccgta tcgtgagcat cctctctcgt 3436 ttcatcggtatcattacccc catgaacaga aattccccct tacacggagg catcaagtga 3496 ccaaacaggaaaaaaccgcc cttaacatgg cccgctttat cagaagccag acattaacgc 3556 ttctggagaaactcaacgag ctggacgcgg atgaacaggc agacatctgt gaatcgcttc 3616 acgaccacgctgatgagctt taccgcagct gcctcgcgcg tttcggtgat gacggtgaaa 3676 acctctgacacatgcagctc ccggagacgg tcacagcttg tctgtaagcg gatgccggga 3736 gcagacaagcccgtcagggc gcgtcagcgg gtgttggcgg gtgtcggggc gcagccatga 3796 cccagtcacgtagcgatagc ggagtgtata ctggcttaac tatgcggcat cagagcagat 3856 tgtactgagagtgcaccata tgcggtgtga aataccgcac agatgcgtaa ggagaaaata 3916 ccgcatcaggcgctcttccg cttcctcgct cactgactcg ctgcgctcgg tcgttcggct 3976 gcggcgagcggtatcagctc actcaaaggc ggtaatacgg ttatccacag aatcagggga 4036 taacgcaggaaagaacatgt gagcaaaagg ccagcaaaag gccaggaacc gtaaaaaggc 4096 cgcgttgctggcgtttttcc ataggctccg cccccctgac gagcatcaca aaaatcgacg 4156 ctcaagtcagaggtggcgaa acccgacagg actataaaga taccaggcgt ttccccctgg 4216 aagctccctcgtgcgctctc ctgttccgac cctgccgctt accggatacc tgtccgcctt 4276 tctcccttcgggaagcgtgg cgctttctca tagctcacgc tgtaggtatc tcagttcggt 4336 gtaggtcgttcgctccaagc tgggctgtgt gcacgaaccc cccgttcagc ccgaccgctg 4396 cgccttatccggtaactatc gtcttgagtc caacccggta agacacgact tatcgccact 4456 ggcagcagccactggtaaca ggattagcag agcgaggtat gtaggcggtg ctacagagtt 4516 cttgaagtggtggcctaact acggctacac tagaaggaca gtatttggta tctgcgctct 4576 gctgaagccagttaccttcg gaaaaagagt tggtagctct tgatccggca aacaaaccac 4636 cgctggtagcggtggttttt ttgtttgcaa gcagcagatt acgcgcagaa aaaaaggatc 4696 tcaagaagatcctttgatct tttctacggg gtctgacgct cagtggaacg aaaactcacg 4756 ttaagggattttggtcatga gattatcaaa aaggatcttc acctagatcc ttttaccccg 4816 gttgataatcagaaaagccc caaaaacagg aagattgtat aagcaaatat ttaaattgta 4876 aacgttaatattttgttaaa attcgcgtta aatttttgtt aaatcagctc attttttaac 4936 caataggccgaaatcggcaa aatcccttat aaatcaaaag aatagcccga gatagggttg 4996 agtgttgttccagtttggaa caagagtcca ctattaaaga acgtggactc caacgtcaaa 5056 gggcgaaaaaccgtctatca gggcgatggc ccactacgtg aaccatcacc caaatcaagt 5116 tttttggggtcgaggtgccg taaagcacta aatcggaacc ctaaagggag cccccgattt 5176 agagcttgacggggaaagcc ggcgaacgtg gcgagaaagg aagggaagaa agcgaaagga 5236 gcgggcgctagggcgctggc aagtgtagcg gtcacgctgc gcgtaaccac cacacccgcc 5296 gcgcttaatgcgccgctaca gggcgcgtaa atcaatctaa agtatatatg agtaaacttg 5356 gtctgacagttaccaatgct taatcagtga ggcacctatc tcagcgatct gtctatttcg 5416 ttcatccatagttgcctgac tccccgtcgt gtagataact acgatacggg agggcttacc 5476 atctggccccagtgctgcaa tgataccgcg agacccacgc tcaccggctc cagatttatc 5536 agcaataaaccagccagccg gaagggccga gcgcagaagt ggtcctgcaa ctttatccgc 5596 ctccatccagtctattaatt gttgccggga agctagagta agtagttcgc cagttaatag 5656 tttgcgcaacgttgttgcca ttgctgcagg catcgtggtg tcacgctcgt cgtttggtat 5716 ggcttcattcagctccggtt cccaacgatc aaggcgagtt acatgatccc ccatgttgtg 5776 caaaaaagcggttagctcct tcggtcctcc gatcgttgtc agaagtaagt tggccgcagt 5836 gttatcactcatggttatgg cagcactgca taattctctt actgtcatgc catccgtaag 5896 atgcttttctgtgactggtg agtactcaac caagtcattc tgagaatagt gtatgcggcg 5956 accgagttgctcttgcccgg cgtcaacacg ggataatacc gcgccacata gcagaacttt 6016 aaaagtgctcatcattggaa aacgttcttc ggggcgaaaa ctctcaagga tcttaccgct 6076 gttgagatccagttcgatgt aacccactcg tgcacccaac tgatcttcag catcttttac 6136 tttcaccagcgtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa aaaagggaat 6196 aagggcgacacggaaatgtt gaatactcat actcttcctt tttcaatatt attgaagcat 6256 ttatcagggttattgtctca tgagcggata catatttgaa tgtatttaga aaaataaaca 6316 aataggggttccgcgcacat ttccccgaaa agtgccacct gacgtctaag aaaccattat 6376 tatcatgacattaacctata aaaataggcg tatcacgagg ccctttcgtc ttcaagaatt 6436 gatcgatcaa6446 3 5674 DNA Artificial Sequence CDS (300)...(1616) Description ofArtificial Sequence/note = synthetic construct 3 ttctcatgtt tgacagcttatctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60 gcagccatcg gaagctgtggtatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120 gctcaaggcg cactcccgttctggataatg ttttttgcgc cgacatcata acggttctgg 180 caaatattct gaaatgagctgttgacaatt aatcatcggc tcgtataatg tggaattgtg 240 agcggataac aattaatgtgtgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atg agc aca aaa aag aaacca tta aca caa gag cag ctt gag gac gca 347 Met Ser Thr Lys Lys Lys ProLeu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgt cgc ctt aaa gca atttat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg Arg Leu Lys Ala Ile TyrGlu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc cag gaa tct gtc gca gacaag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln Glu Ser Val Ala Asp LysMet Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct tta ttt aat ggc atc aatgca tta aat gct tat aac gcg gca 491 Gly Ala Leu Phe Asn Gly Ile Asn AlaLeu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaa att ctc aaa gtt agcgtt gaa gaa ttc cat atg gat 539 Leu Leu Ala Lys Ile Leu Lys Val Ser ValGlu Glu Phe His Met Asp 65 70 75 80 tat caa gtg tca agt cca atc tat gacatc aat tat tat aca tcg gag 587 Tyr Gln Val Ser Ser Pro Ile Tyr Asp IleAsn Tyr Tyr Thr Ser Glu 85 90 95 ccc tgc caa aaa atc aat gtg aag caa atcgca gcc cgc ctc ctg cct 635 Pro Cys Gln Lys Ile Asn Val Lys Gln Ile AlaAla Arg Leu Leu Pro 100 105 110 ccg ctc tac tca ctg gtg ttc atc ttt ggtttt gtg ggc aac atg ctg 683 Pro Leu Tyr Ser Leu Val Phe Ile Phe Gly PheVal Gly Asn Met Leu 115 120 125 gtc atc ctc atc ctg ata aac tgc aaa aggctg aag agc atg act gac 731 Val Ile Leu Ile Leu Ile Asn Cys Lys Arg LeuLys Ser Met Thr Asp 130 135 140 atc tac ctg ctc aac ctg gcc atc tct gacctg ttt ttc ctt ctt act 779 Ile Tyr Leu Leu Asn Leu Ala Ile Ser Asp LeuPhe Phe Leu Leu Thr 145 150 155 160 gtc ccc ttc tgg gct cac tat gct gccgcc cag tgg gac ttt gga aat 827 Val Pro Phe Trp Ala His Tyr Ala Ala AlaGln Trp Asp Phe Gly Asn 165 170 175 aca atg tgt caa ctc ttg aca ggg ctctat ttt ata ggc ttc ttc tct 875 Thr Met Cys Gln Leu Leu Thr Gly Leu TyrPhe Ile Gly Phe Phe Ser 180 185 190 gga atc ttc ttc atc atc ctc ctg acaatc gat agg tac ctg gct gtc 923 Gly Ile Phe Phe Ile Ile Leu Leu Thr IleAsp Arg Tyr Leu Ala Val 195 200 205 gtc cat gct gtg ttt gct tta aaa gccagg acg gtc acc ttt ggg gtg 971 Val His Ala Val Phe Ala Leu Lys Ala ArgThr Val Thr Phe Gly Val 210 215 220 gtg aca agt gtg atc act tgg gtg gtggct gtg ttt gcg tct ctc cca 1019 Val Thr Ser Val Ile Thr Trp Val Val AlaVal Phe Ala Ser Leu Pro 225 230 235 240 gga atc atc ttt acc aga tct caaaaa gaa ggt ctt cat tac acc tgc 1067 Gly Ile Ile Phe Thr Arg Ser Gln LysGlu Gly Leu His Tyr Thr Cys 245 250 255 agc tct cat ttt cca tac agt cagtat caa ttc tgg aag aat ttc cag 1115 Ser Ser His Phe Pro Tyr Ser Gln TyrGln Phe Trp Lys Asn Phe Gln 260 265 270 aca tta aag ata gtc atc ttg gggctg gtc ctg ccg ctg ctt gtc atg 1163 Thr Leu Lys Ile Val Ile Leu Gly LeuVal Leu Pro Leu Leu Val Met 275 280 285 gtc atc tgc tac tcg gga atc ctaaaa act ctg ctt cgg tgt cga aat 1211 Val Ile Cys Tyr Ser Gly Ile Leu LysThr Leu Leu Arg Cys Arg Asn 290 295 300 gag aag aag agg cac agg gct gtgagg ctt atc ttc acc atc atg att 1259 Glu Lys Lys Arg His Arg Ala Val ArgLeu Ile Phe Thr Ile Met Ile 305 310 315 320 gtt tat ttt ctc ttc tgg gctccc tac aac att gtc ctt ctc ctg aac 1307 Val Tyr Phe Leu Phe Trp Ala ProTyr Asn Ile Val Leu Leu Leu Asn 325 330 335 acc ttc cag gaa ttc ttt ggcctg aat aat tgc agt agc tct aac agg 1355 Thr Phe Gln Glu Phe Phe Gly LeuAsn Asn Cys Ser Ser Ser Asn Arg 340 345 350 ttg gac caa gct atg cag gtgaca gag act ctt ggg atg acg cac tgc 1403 Leu Asp Gln Ala Met Gln Val ThrGlu Thr Leu Gly Met Thr His Cys 355 360 365 tgc atc aac ccc atc atc tatgcc ttt gtc ggg gag aag ttc aga aac 1451 Cys Ile Asn Pro Ile Ile Tyr AlaPhe Val Gly Glu Lys Phe Arg Asn 370 375 380 tac ctc tta gtc ttc ttc caaaag cac att gcc aaa cgc ttc tgc aaa 1499 Tyr Leu Leu Val Phe Phe Gln LysHis Ile Ala Lys Arg Phe Cys Lys 385 390 395 400 tgc tgt tct att ttc cagcaa gag gct ccc gag cga gca agc tca gtt 1547 Cys Cys Ser Ile Phe Gln GlnGlu Ala Pro Glu Arg Ala Ser Ser Val 405 410 415 tac acc cga tcc act ggggag cag gaa ata tct gtg ggc ttg gcc tcg 1595 Tyr Thr Arg Ser Thr Gly GluGln Glu Ile Ser Val Gly Leu Ala Ser 420 425 430 agg cac cat cac cac caccac tgaaagcttt aatgcggtag tttatcacag 1646 Arg His His His His His His435 ttaaattgct aacgcagtca ggcaccgtgt atgaaatcta acaatgcgct catcgtcatc1706 ctcggcaccg tcaccctgga tgctgtaggc ataggcttgg ttatgccggt actgccgggc1766 ctcttgcggg atcgacgcga ggctggatgg ccttccccat tatgattctt ctcgcttccg1826 gcggcatcgg gatgcccgcg ttgcaggcca tgctgtccag gcaggtagat gacgaccatc1886 agggacagct tcaaggatcg ctcgcggctc ttaccagcct aacttcgatc actggaccgc1946 tgatcgtcac ggcgatttat gccgcctcgg cgagcacatg gaacgggttg gcatggattg2006 taggcgccgc cctatacctt gtctgcctcc ccgcgttgcg tcgcggtgca tggagccggg2066 ccacctcgac ctgaatggaa gccggcggca cctcgctaac ggattcacca ctccaagaat2126 tggagccaat caattcttgc ggagaactgt gaatgcgcaa accaaccctt ggcagaacat2186 atccatcgcg tccgccatct ccagcagccg cacgcggcgc atctcgggca gcgttgggtc2246 ctggccacgg gtgcgcatga tcgtgctcct gtcgttgagg acccggctag gctggcgggg2306 ttgccttact ggttagcaga atgaatcacc gatacgcgag cgaacgtgaa gcgactgctg2366 ctgcaaaacg tctgcgacct gagcaacaac atgaatggtc ttcggtttcc gtgtttcgta2426 aagtctggaa acgcggaagt cagcgccctg caccattatg ttccggatct gcatcgcagg2486 atgctgctgg ctaccctgtg gaacacctac atctgtatta acgaagcgct ggcattgacc2546 ctgagtgatt tttctctggt cccgccgcat ccataccgcc agttgtttac cctcacaacg2606 ttccagtaac cgggcatgtt catcatcagt aacccgtatc gtgagcatcc tctctcgttt2666 catcggtatc attaccccca tgaacagaaa ttccccctta cacggaggca tcaagtgacc2726 aaacaggaaa aaaccgccct taacatggcc cgctttatca gaagccagac attaacgctt2786 ctggagaaac tcaacgagct ggacgcggat gaacaggcag acatctgtga atcgcttcac2846 gaccacgctg atgagcttta ccgcagctgc ctcgcgcgtt tcggtgatga cggtgaaaac2906 ctctgacaca tgcagctccc ggagacggtc acagcttgtc tgtaagcgga tgccgggagc2966 agacaagccc gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc agccatgacc3026 cagtcacgta gcgatagcgg agtgtatact ggcttaacta tgcggcatca gagcagattg3086 tactgagagt gcaccatatg cggtgtgaaa taccgcacag atgcgtaagg agaaaatacc3146 gcatcaggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgc3206 ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggata3266 acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg3326 cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct3386 caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa3446 gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc3506 tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt3566 aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg3626 ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg3686 cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct3746 tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgc3806 tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg3866 ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc3926 aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt3986 aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaccccggt4046 tgataatcag aaaagcccca aaaacaggaa gattgtataa gcaaatattt aaattgtaaa4106 cgttaatatt ttgttaaaat tcgcgttaaa tttttgttaa atcagctcat tttttaacca4166 ataggccgaa atcggcaaaa tcccttataa atcaaaagaa tagcccgaga tagggttgag4226 tgttgttcca gtttggaaca agagtccact attaaagaac gtggactcca acgtcaaagg4286 gcgaaaaacc gtctatcagg gcgatggccc actacgtgaa ccatcaccca aatcaagttt4346 tttggggtcg aggtgccgta aagcactaaa tcggaaccct aaagggagcc cccgatttag4406 agcttgacgg ggaaagccgg cgaacgtggc gagaaaggaa gggaagaaag cgaaaggagc4466 gggcgctagg gcgctggcaa gtgtagcggt cacgctgcgc gtaaccacca cacccgccgc4526 gcttaatgcg ccgctacagg gcgcgtaaat caatctaaag tatatatgag taaacttggt4586 ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt4646 catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat4706 ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctcca gatttatcag4766 caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct4826 ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt4886 tgcgcaacgt tgttgccatt gctgcaggca tcgtggtgtc acgctcgtcg tttggtatgg4946 cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca5006 aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt5066 tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat5126 gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac5186 cgagttgctc ttgcccggcg tcaacacggg ataataccgc gccacatagc agaactttaa5246 aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt5306 tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt5366 tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa5426 gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt5486 atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa5546 taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtctaagaa accattatta5606 tcatgacatt aacctataaa aataggcgta tcacgaggcc ctttcgtctt caagaattga5666 tcgatcaa 5674 4 5857 DNA Artificial Sequence CDS (300)...(1799)Description of Artificial Sequence/note = synthetic construct 4ttctcatgtt tgacagctta tctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60gcagccatcg gaagctgtgg tatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120gctcaaggcg cactcccgtt ctggataatg ttttttgcgc cgacatcata acggttctgg 180caaatattct gaaatgagct gttgacaatt aatcatcggc tcgtataatg tggaattgtg 240agcggataac aattaatgtg tgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atgagc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gca 347 Met SerThr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgtcgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg ArgLeu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc caggaa tct gtc gca gac aag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln GluSer Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct ttattt aat ggc atc aat gca tta aat gct tat aac gcg gca 491 Gly Ala Leu PheAsn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaaatt ctc aaa gtt agc gtt gaa gaa ttc acc atg ggg 539 Leu Leu Ala Lys IleLeu Lys Val Ser Val Glu Glu Phe Thr Met Gly 65 70 75 80 caa ccc ggg aacggc agc gcc ttc ttg ctg gca ccc aat gga agc cat 587 Gln Pro Gly Asn GlySer Ala Phe Leu Leu Ala Pro Asn Gly Ser His 85 90 95 gcg ccg gac cac gacgtc acg cag caa agg gac gag gtg tgg gtg gtg 635 Ala Pro Asp His Asp ValThr Gln Gln Arg Asp Glu Val Trp Val Val 100 105 110 ggc atg ggc atc gtcatg tct ctc atc gtc ctg gcc atc gtg ttt ggc 683 Gly Met Gly Ile Val MetSer Leu Ile Val Leu Ala Ile Val Phe Gly 115 120 125 aat gtg ctg gtc atcaca gcc att gcc aag ttc gag cgt ctg cag acg 731 Asn Val Leu Val Ile ThrAla Ile Ala Lys Phe Glu Arg Leu Gln Thr 130 135 140 gtc acc aac tac ttcatc aca agc ttg gcc tgt gct gat ctg gtc atg 779 Val Thr Asn Tyr Phe IleThr Ser Leu Ala Cys Ala Asp Leu Val Met 145 150 155 160 ggg cta gca gtggtg ccc ttt ggg gcc gcc cat att ctc atg aaa atg 827 Gly Leu Ala Val ValPro Phe Gly Ala Ala His Ile Leu Met Lys Met 165 170 175 tgg act ttt ggcaac ttc tgg tgc gaa ttc tgg act tcc att gat gtg 875 Trp Thr Phe Gly AsnPhe Trp Cys Glu Phe Trp Thr Ser Ile Asp Val 180 185 190 ctg tgc gtc acggca tcg att gag acc ctg tgc gtg atc gca gtc gac 923 Leu Cys Val Thr AlaSer Ile Glu Thr Leu Cys Val Ile Ala Val Asp 195 200 205 cgc tac ttt gccatt act agt cct ttc aag tac cag agc ctg ctg acc 971 Arg Tyr Phe Ala IleThr Ser Pro Phe Lys Tyr Gln Ser Leu Leu Thr 210 215 220 aag aat aag gcccgg gtg atc att ctg atg gtg tgg att gtg tca ggc 1019 Lys Asn Lys Ala ArgVal Ile Ile Leu Met Val Trp Ile Val Ser Gly 225 230 235 240 ctt acc tccttc ttg ccc att cag atg cac tgg tac agg gcc acc cac 1067 Leu Thr Ser PheLeu Pro Ile Gln Met His Trp Tyr Arg Ala Thr His 245 250 255 cag gaa gccatc aac tgc tat gcc aat gag acc tgc tgt gac ttc ttc 1115 Gln Glu Ala IleAsn Cys Tyr Ala Asn Glu Thr Cys Cys Asp Phe Phe 260 265 270 acg aac caagcc tat gcc att gcc tct tcc atc gtg tcc ttc tac gtt 1163 Thr Asn Gln AlaTyr Ala Ile Ala Ser Ser Ile Val Ser Phe Tyr Val 275 280 285 ccc ctg gtgatc atg gtc ttc gtc tac tcc agg gtc ttt cag gag gcc 1211 Pro Leu Val IleMet Val Phe Val Tyr Ser Arg Val Phe Gln Glu Ala 290 295 300 aaa agg cagctc cag aag att gac aaa tct gag ggc cgc ttc cat gtc 1259 Lys Arg Gln LeuGln Lys Ile Asp Lys Ser Glu Gly Arg Phe His Val 305 310 315 320 cag aacctt agc cag gtg gag cag gat ggg cgg acg ggg cat gga ctc 1307 Gln Asn LeuSer Gln Val Glu Gln Asp Gly Arg Thr Gly His Gly Leu 325 330 335 cgc agatct tcc aag ttc tgc ttg aag gag cac aaa gcc ctc aag acg 1355 Arg Arg SerSer Lys Phe Cys Leu Lys Glu His Lys Ala Leu Lys Thr 340 345 350 tta ggcatc atc atg ggc act ttc acc ctc tgc tgg ctg ccc ttc ttc 1403 Leu Gly IleIle Met Gly Thr Phe Thr Leu Cys Trp Leu Pro Phe Phe 355 360 365 atc gttaac att gtg cat gtg atc cag gat aac ctc atc cgt aag gaa 1451 Ile Val AsnIle Val His Val Ile Gln Asp Asn Leu Ile Arg Lys Glu 370 375 380 gtt tacatc ctc cta aat tgg ata ggc tat gtc aat tct ggt ttc aat 1499 Val Tyr IleLeu Leu Asn Trp Ile Gly Tyr Val Asn Ser Gly Phe Asn 385 390 395 400 cccctt atc tac tgc cgg agc cca gat ttc agg att gcc ttc cag gag 1547 Pro LeuIle Tyr Cys Arg Ser Pro Asp Phe Arg Ile Ala Phe Gln Glu 405 410 415 cttctg tgc ctg cgc agg tct tct ttg aag gcc tat ggc aat ggc tac 1595 Leu LeuCys Leu Arg Arg Ser Ser Leu Lys Ala Tyr Gly Asn Gly Tyr 420 425 430 tccagc aac ggc aac aca ggg gag cag agt gga tat cac gtg gaa cag 1643 Ser SerAsn Gly Asn Thr Gly Glu Gln Ser Gly Tyr His Val Glu Gln 435 440 445 gagaaa gaa aat aaa ctg ctg tgt gaa gac ctc cca ggc acg gaa gac 1691 Glu LysGlu Asn Lys Leu Leu Cys Glu Asp Leu Pro Gly Thr Glu Asp 450 455 460 tttgtg ggc cat caa ggt act gtg cct agc gat aac att gat tca caa 1739 Phe ValGly His Gln Gly Thr Val Pro Ser Asp Asn Ile Asp Ser Gln 465 470 475 480ggg agg aat tgt agt aca aat gac tca ctg cta gcc tcg agg cac cat 1787 GlyArg Asn Cys Ser Thr Asn Asp Ser Leu Leu Ala Ser Arg His His 485 490 495cac cac cac cac tgaaagcttt aatgcggtag tttatcacag ttaaattgct 1839 His HisHis His 500 aacgcagtca ggcaccgtgt atgaaatcta acaatgcgct catcgtcatcctcggcaccg 1899 tcaccctgga tgctgtaggc ataggcttgg ttatgccggt actgccgggcctcttgcggg 1959 atcgacgcga ggctggatgg ccttccccat tatgattctt ctcgcttccggcggcatcgg 2019 gatgcccgcg ttgcaggcca tgctgtccag gcaggtagat gacgaccatcagggacagct 2079 tcaaggatcg ctcgcggctc ttaccagcct aacttcgatc actggaccgctgatcgtcac 2139 ggcgatttat gccgcctcgg cgagcacatg gaacgggttg gcatggattgtaggcgccgc 2199 cctatacctt gtctgcctcc ccgcgttgcg tcgcggtgca tggagccgggccacctcgac 2259 ctgaatggaa gccggcggca cctcgctaac ggattcacca ctccaagaattggagccaat 2319 caattcttgc ggagaactgt gaatgcgcaa accaaccctt ggcagaacatatccatcgcg 2379 tccgccatct ccagcagccg cacgcggcgc atctcgggca gcgttgggtcctggccacgg 2439 gtgcgcatga tcgtgctcct gtcgttgagg acccggctag gctggcggggttgccttact 2499 ggttagcaga atgaatcacc gatacgcgag cgaacgtgaa gcgactgctgctgcaaaacg 2559 tctgcgacct gagcaacaac atgaatggtc ttcggtttcc gtgtttcgtaaagtctggaa 2619 acgcggaagt cagcgccctg caccattatg ttccggatct gcatcgcaggatgctgctgg 2679 ctaccctgtg gaacacctac atctgtatta acgaagcgct ggcattgaccctgagtgatt 2739 tttctctggt cccgccgcat ccataccgcc agttgtttac cctcacaacgttccagtaac 2799 cgggcatgtt catcatcagt aacccgtatc gtgagcatcc tctctcgtttcatcggtatc 2859 attaccccca tgaacagaaa ttccccctta cacggaggca tcaagtgaccaaacaggaaa 2919 aaaccgccct taacatggcc cgctttatca gaagccagac attaacgcttctggagaaac 2979 tcaacgagct ggacgcggat gaacaggcag acatctgtga atcgcttcacgaccacgctg 3039 atgagcttta ccgcagctgc ctcgcgcgtt tcggtgatga cggtgaaaacctctgacaca 3099 tgcagctccc ggagacggtc acagcttgtc tgtaagcgga tgccgggagcagacaagccc 3159 gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc agccatgacccagtcacgta 3219 gcgatagcgg agtgtatact ggcttaacta tgcggcatca gagcagattgtactgagagt 3279 gcaccatatg cggtgtgaaa taccgcacag atgcgtaagg agaaaataccgcatcaggcg 3339 ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgcggcgagcggt 3399 atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggataacgcaggaaa 3459 gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccgcgttgctggc 3519 gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgctcaagtcagag 3579 gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaagctccctcgt 3639 gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttctcccttcggg 3699 aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgtaggtcgttcg 3759 ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcgccttatccgg 3819 taactatcgt cttgagtcca acccggtaag acacgactta tcgccactggcagcagccac 3879 tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttcttgaagtggtg 3939 gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgctgaagccagt 3999 taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccgctggtagcgg 4059 tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctcaagaagatcc 4119 tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgttaagggatttt 4179 ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaccccggttgataatcag 4239 aaaagcccca aaaacaggaa gattgtataa gcaaatattt aaattgtaaacgttaatatt 4299 ttgttaaaat tcgcgttaaa tttttgttaa atcagctcat tttttaaccaataggccgaa 4359 atcggcaaaa tcccttataa atcaaaagaa tagcccgaga tagggttgagtgttgttcca 4419 gtttggaaca agagtccact attaaagaac gtggactcca acgtcaaagggcgaaaaacc 4479 gtctatcagg gcgatggccc actacgtgaa ccatcaccca aatcaagttttttggggtcg 4539 aggtgccgta aagcactaaa tcggaaccct aaagggagcc cccgatttagagcttgacgg 4599 ggaaagccgg cgaacgtggc gagaaaggaa gggaagaaag cgaaaggagcgggcgctagg 4659 gcgctggcaa gtgtagcggt cacgctgcgc gtaaccacca cacccgccgcgcttaatgcg 4719 ccgctacagg gcgcgtaaat caatctaaag tatatatgag taaacttggtctgacagtta 4779 ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgttcatccatagt 4839 tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccatctggccccag 4899 tgctgcaatg ataccgcgag acccacgctc accggctcca gatttatcagcaataaacca 4959 gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcctccatccagtc 5019 tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtttgcgcaacgt 5079 tgttgccatt gctgcaggca tcgtggtgtc acgctcgtcg tttggtatggcttcattcag 5139 ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgcaaaaaagcggt 5199 tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgttatcactcat 5259 ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagatgcttttctgt 5319 gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgaccgagttgctc 5379 ttgcccggcg tcaacacggg ataataccgc gccacatagc agaactttaaaagtgctcat 5439 cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgttgagatccag 5499 ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactttcaccagcgt 5559 ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataagggcgacacg 5619 gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcatttatcagggtta 5679 ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaataggggttcc 5739 gcgcacattt ccccgaaaag tgccacctga cgtctaagaa accattattatcatgacatt 5799 aacctataaa aataggcgta tcacgaggcc ctttcgtctt caagaattgatcgatcaa 5857 5 5734 DNA Artificial Sequence CDS (300)...(1676)Description of Artificial Sequence/note = synthetic construct 5ttctcatgtt tgacagctta tctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60gcagccatcg gaagctgtgg tatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120gctcaaggcg cactcccgtt ctggataatg ttttttgcgc cgacatcata acggttctgg 180caaatattct gaaatgagct gttgacaatt aatcatcggc tcgtataatg tggaattgtg 240agcggataac aattaatgtg tgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atgagc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gca 347 Met SerThr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgtcgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg ArgLeu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc caggaa tct gtc gca gac aag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln GluSer Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct ttattt aat ggc atc aat gca tta aat gct tat aac gcg gca 491 Gly Ala Leu PheAsn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaaatt ctc aaa gtt agc gtt gaa gaa ttc cat atg ttc 539 Leu Leu Ala Lys IleLeu Lys Val Ser Val Glu Glu Phe His Met Phe 65 70 75 80 aaa cac ctc cgaaga tgg ttt atc act cac ata ttt ggg cgt tcc cgg 587 Lys His Leu Arg ArgTrp Phe Ile Thr His Ile Phe Gly Arg Ser Arg 85 90 95 caa cgg gca agg ctggtc tct aaa gaa gga aga tgt aac atc gag ttt 635 Gln Arg Ala Arg Leu ValSer Lys Glu Gly Arg Cys Asn Ile Glu Phe 100 105 110 ggc aat gtg gat gcacag tca agg ttt ata ttc ttt gtg gac atc tgg 683 Gly Asn Val Asp Ala GlnSer Arg Phe Ile Phe Phe Val Asp Ile Trp 115 120 125 aca act gtg ctg gacctg aaa tgg agg tac aaa atg acc gtg ttc atc 731 Thr Thr Val Leu Asp LeuLys Trp Arg Tyr Lys Met Thr Val Phe Ile 130 135 140 aca gcc ttc ttg gggagt tgg ttc ctc ttt ggt ctc ctg tgg tat gtc 779 Thr Ala Phe Leu Gly SerTrp Phe Leu Phe Gly Leu Leu Trp Tyr Val 145 150 155 160 gta gcg tat gttcat aag gac ctc cca gag ttc tac ccg cct gac aac 827 Val Ala Tyr Val HisLys Asp Leu Pro Glu Phe Tyr Pro Pro Asp Asn 165 170 175 cgc act cct tgtgtg gag aac att aat ggc atg act tca gcc ttt ctg 875 Arg Thr Pro Cys ValGlu Asn Ile Asn Gly Met Thr Ser Ala Phe Leu 180 185 190 ttt tct cta gagact caa gtg acc ata ggt tac gga ttc agg ttt gtg 923 Phe Ser Leu Glu ThrGln Val Thr Ile Gly Tyr Gly Phe Arg Phe Val 195 200 205 aca gaa cag tgcgcc act gcc att ttc ctg ctt atc ttc cag tct att 971 Thr Glu Gln Cys AlaThr Ala Ile Phe Leu Leu Ile Phe Gln Ser Ile 210 215 220 ctt gga gtg atcatc aat tcc ttc atg tgt ggt gcc att tta gcc aag 1019 Leu Gly Val Ile IleAsn Ser Phe Met Cys Gly Ala Ile Leu Ala Lys 225 230 235 240 atc tct agaccc aaa aaa cgt gct aaa acc att acg ttc agc aag aat 1067 Ile Ser Arg ProLys Lys Arg Ala Lys Thr Ile Thr Phe Ser Lys Asn 245 250 255 gcg gtg atcagc aag cgt ggc ggg aag ctc tgc ctc ctc atc cga gtg 1115 Ala Val Ile SerLys Arg Gly Gly Lys Leu Cys Leu Leu Ile Arg Val 260 265 270 gcc aat cttagg aag agc ctt ctg att ggc agc cac ata tat ggc aag 1163 Ala Asn Leu ArgLys Ser Leu Leu Ile Gly Ser His Ile Tyr Gly Lys 275 280 285 ctt cta aagaca acc atc act cct gaa ggc gag acc atc att ttg gat 1211 Leu Leu Lys ThrThr Ile Thr Pro Glu Gly Glu Thr Ile Ile Leu Asp 290 295 300 cag acc aacatc aac ttt gtc gtc gac gct ggc aat gaa aat ttg ttc 1259 Gln Thr Asn IleAsn Phe Val Val Asp Ala Gly Asn Glu Asn Leu Phe 305 310 315 320 ttc atatcc cca ctg acg atc tac cac att att gac cac aac agc cct 1307 Phe Ile SerPro Leu Thr Ile Tyr His Ile Ile Asp His Asn Ser Pro 325 330 335 ttc ttccac atg gca gca gaa act ctt tcc caa cag gac ttt gag ctg 1355 Phe Phe HisMet Ala Ala Glu Thr Leu Ser Gln Gln Asp Phe Glu Leu 340 345 350 gtg gtcttt tta gat ggc aca gtg gaa tcc acc agt gca acc tgc cag 1403 Val Val PheLeu Asp Gly Thr Val Glu Ser Thr Ser Ala Thr Cys Gln 355 360 365 gtc cgcacg tca tac gtc cca gag gag gtg ctt tgg ggc tac cgt ttc 1451 Val Arg ThrSer Tyr Val Pro Glu Glu Val Leu Trp Gly Tyr Arg Phe 370 375 380 gtt cctatt gtg tcc aag acc aag gaa ggg aaa tac cga gtt gat ttt 1499 Val Pro IleVal Ser Lys Thr Lys Glu Gly Lys Tyr Arg Val Asp Phe 385 390 395 400 cataac ttc ggt aag aca gtg gaa gtg gag acc cct cac tgt gcc atg 1547 His AsnPhe Gly Lys Thr Val Glu Val Glu Thr Pro His Cys Ala Met 405 410 415 tgcctc tat aat gag aaa gat gcc agg gcc agg atg aag aga ggc tat 1595 Cys LeuTyr Asn Glu Lys Asp Ala Arg Ala Arg Met Lys Arg Gly Tyr 420 425 430 gacaac cct aac ttt gtc ttg tca gaa gtt gat gaa acg gac gac acc 1643 Asp AsnPro Asn Phe Val Leu Ser Glu Val Asp Glu Thr Asp Asp Thr 435 440 445 cagatg gcc tcg agg cac cat cac cac cac cac tgaaagcttt aatgcggtag 1696 GlnMet Ala Ser Arg His His His His His His 450 455 tttatcacag ttaaattgctaacgcagtca ggcaccgtgt atgaaatcta acaatgcgct 1756 catcgtcatc ctcggcaccgtcaccctgga tgctgtaggc ataggcttgg ttatgccggt 1816 actgccgggc ctcttgcgggatcgacgcga ggctggatgg ccttccccat tatgattctt 1876 ctcgcttccg gcggcatcgggatgcccgcg ttgcaggcca tgctgtccag gcaggtagat 1936 gacgaccatc agggacagcttcaaggatcg ctcgcggctc ttaccagcct aacttcgatc 1996 actggaccgc tgatcgtcacggcgatttat gccgcctcgg cgagcacatg gaacgggttg 2056 gcatggattg taggcgccgccctatacctt gtctgcctcc ccgcgttgcg tcgcggtgca 2116 tggagccggg ccacctcgacctgaatggaa gccggcggca cctcgctaac ggattcacca 2176 ctccaagaat tggagccaatcaattcttgc ggagaactgt gaatgcgcaa accaaccctt 2236 ggcagaacat atccatcgcgtccgccatct ccagcagccg cacgcggcgc atctcgggca 2296 gcgttgggtc ctggccacgggtgcgcatga tcgtgctcct gtcgttgagg acccggctag 2356 gctggcgggg ttgccttactggttagcaga atgaatcacc gatacgcgag cgaacgtgaa 2416 gcgactgctg ctgcaaaacgtctgcgacct gagcaacaac atgaatggtc ttcggtttcc 2476 gtgtttcgta aagtctggaaacgcggaagt cagcgccctg caccattatg ttccggatct 2536 gcatcgcagg atgctgctggctaccctgtg gaacacctac atctgtatta acgaagcgct 2596 ggcattgacc ctgagtgatttttctctggt cccgccgcat ccataccgcc agttgtttac 2656 cctcacaacg ttccagtaaccgggcatgtt catcatcagt aacccgtatc gtgagcatcc 2716 tctctcgttt catcggtatcattaccccca tgaacagaaa ttccccctta cacggaggca 2776 tcaagtgacc aaacaggaaaaaaccgccct taacatggcc cgctttatca gaagccagac 2836 attaacgctt ctggagaaactcaacgagct ggacgcggat gaacaggcag acatctgtga 2896 atcgcttcac gaccacgctgatgagcttta ccgcagctgc ctcgcgcgtt tcggtgatga 2956 cggtgaaaac ctctgacacatgcagctccc ggagacggtc acagcttgtc tgtaagcgga 3016 tgccgggagc agacaagcccgtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc 3076 agccatgacc cagtcacgtagcgatagcgg agtgtatact ggcttaacta tgcggcatca 3136 gagcagattg tactgagagtgcaccatatg cggtgtgaaa taccgcacag atgcgtaagg 3196 agaaaatacc gcatcaggcgctcttccgct tcctcgctca ctgactcgct gcgctcggtc 3256 gttcggctgc ggcgagcggtatcagctcac tcaaaggcgg taatacggtt atccacagaa 3316 tcaggggata acgcaggaaagaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt 3376 aaaaaggccg cgttgctggcgtttttccat aggctccgcc cccctgacga gcatcacaaa 3436 aatcgacgct caagtcagaggtggcgaaac ccgacaggac tataaagata ccaggcgttt 3496 ccccctggaa gctccctcgtgcgctctcct gttccgaccc tgccgcttac cggatacctg 3556 tccgcctttc tcccttcgggaagcgtggcg ctttctcata gctcacgctg taggtatctc 3616 agttcggtgt aggtcgttcgctccaagctg ggctgtgtgc acgaaccccc cgttcagccc 3676 gaccgctgcg ccttatccggtaactatcgt cttgagtcca acccggtaag acacgactta 3736 tcgccactgg cagcagccactggtaacagg attagcagag cgaggtatgt aggcggtgct 3796 acagagttct tgaagtggtggcctaactac ggctacacta gaaggacagt atttggtatc 3856 tgcgctctgc tgaagccagttaccttcgga aaaagagttg gtagctcttg atccggcaaa 3916 caaaccaccg ctggtagcggtggttttttt gtttgcaagc agcagattac gcgcagaaaa 3976 aaaggatctc aagaagatcctttgatcttt tctacggggt ctgacgctca gtggaacgaa 4036 aactcacgtt aagggattttggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt 4096 ttaccccggt tgataatcagaaaagcccca aaaacaggaa gattgtataa gcaaatattt 4156 aaattgtaaa cgttaatattttgttaaaat tcgcgttaaa tttttgttaa atcagctcat 4216 tttttaacca ataggccgaaatcggcaaaa tcccttataa atcaaaagaa tagcccgaga 4276 tagggttgag tgttgttccagtttggaaca agagtccact attaaagaac gtggactcca 4336 acgtcaaagg gcgaaaaaccgtctatcagg gcgatggccc actacgtgaa ccatcaccca 4396 aatcaagttt tttggggtcgaggtgccgta aagcactaaa tcggaaccct aaagggagcc 4456 cccgatttag agcttgacggggaaagccgg cgaacgtggc gagaaaggaa gggaagaaag 4516 cgaaaggagc gggcgctagggcgctggcaa gtgtagcggt cacgctgcgc gtaaccacca 4576 cacccgccgc gcttaatgcgccgctacagg gcgcgtaaat caatctaaag tatatatgag 4636 taaacttggt ctgacagttaccaatgctta atcagtgagg cacctatctc agcgatctgt 4696 ctatttcgtt catccatagttgcctgactc cccgtcgtgt agataactac gatacgggag 4756 ggcttaccat ctggccccagtgctgcaatg ataccgcgag acccacgctc accggctcca 4816 gatttatcag caataaaccagccagccgga agggccgagc gcagaagtgg tcctgcaact 4876 ttatccgcct ccatccagtctattaattgt tgccgggaag ctagagtaag tagttcgcca 4936 gttaatagtt tgcgcaacgttgttgccatt gctgcaggca tcgtggtgtc acgctcgtcg 4996 tttggtatgg cttcattcagctccggttcc caacgatcaa ggcgagttac atgatccccc 5056 atgttgtgca aaaaagcggttagctccttc ggtcctccga tcgttgtcag aagtaagttg 5116 gccgcagtgt tatcactcatggttatggca gcactgcata attctcttac tgtcatgcca 5176 tccgtaagat gcttttctgtgactggtgag tactcaacca agtcattctg agaatagtgt 5236 atgcggcgac cgagttgctcttgcccggcg tcaacacggg ataataccgc gccacatagc 5296 agaactttaa aagtgctcatcattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc 5356 ttaccgctgt tgagatccagttcgatgtaa cccactcgtg cacccaactg atcttcagca 5416 tcttttactt tcaccagcgtttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa 5476 aagggaataa gggcgacacggaaatgttga atactcatac tcttcctttt tcaatattat 5536 tgaagcattt atcagggttattgtctcatg agcggataca tatttgaatg tatttagaaa 5596 aataaacaaa taggggttccgcgcacattt ccccgaaaag tgccacctga cgtctaagaa 5656 accattatta tcatgacattaacctataaa aataggcgta tcacgaggcc ctttcgtctt 5716 caagaattga tcgatcaa5734 6 5197 DNA Artificial Sequence CDS (300)...(1139) Description ofArtificial Sequence/note = synthetic construct 6 ttctcatgtt tgacagcttatctcatcgac tgcacggtgc accaatgctt ctggcgtcag 60 gcagccatcg gaagctgtggtatggctgtg caggtcgtaa atcactgcat aattcgtgtc 120 gctcaaggcg cactcccgttctggataatg ttttttgcgc cgacatcata acggttctgg 180 caaatattct gaaatgagctgttgacaatt aatcatcggc tcgtataatg tggaattgtg 240 agcggataac aattaatgtgtgaatgtgag cggatacaat ttcacacagg aaacagcgt 299 atg agc aca aaa aag aaacca tta aca caa gag cag ctt gag gac gca 347 Met Ser Thr Lys Lys Lys ProLeu Thr Gln Glu Gln Leu Glu Asp Ala 1 5 10 15 cgt cgc ctt aaa gca atttat gaa aaa aag aaa aat gaa ctt ggc tta 395 Arg Arg Leu Lys Ala Ile TyrGlu Lys Lys Lys Asn Glu Leu Gly Leu 20 25 30 tcc cag gaa tct gtc gca gacaag atg ggg atg ggg cag tca ggc gtt 443 Ser Gln Glu Ser Val Ala Asp LysMet Gly Met Gly Gln Ser Gly Val 35 40 45 ggt gct tta ttt aat ggc atc aatgca tta aat gct tat aac gcg gca 491 Gly Ala Leu Phe Asn Gly Ile Asn AlaLeu Asn Ala Tyr Asn Ala Ala 50 55 60 ttg cta gca aaa att ctc aaa gtt agcgtt gaa gaa ttc cat atg gct 539 Leu Leu Ala Lys Ile Leu Lys Val Ser ValGlu Glu Phe His Met Ala 65 70 75 80 gcc atc cgg aag aaa ctg gtg att gttggt gat gga gcc tgt gga aag 587 Ala Ile Arg Lys Lys Leu Val Ile Val GlyAsp Gly Ala Cys Gly Lys 85 90 95 aca tgc ttg ctc ata gtc ttc agc aag gaccag ttc cca gag gtg tat 635 Thr Cys Leu Leu Ile Val Phe Ser Lys Asp GlnPhe Pro Glu Val Tyr 100 105 110 gtg ccc aca gtg ttt gag aac tat gtg gcagat atc gag gtg gat gga 683 Val Pro Thr Val Phe Glu Asn Tyr Val Ala AspIle Glu Val Asp Gly 115 120 125 aag cag gta gag ttg gct ttg tgg gac acagct ggg cag gaa gat tat 731 Lys Gln Val Glu Leu Ala Leu Trp Asp Thr AlaGly Gln Glu Asp Tyr 130 135 140 gat cgc ctg agg ccc ctc tcc tac cca gatacc gat gtt ata ctg atg 779 Asp Arg Leu Arg Pro Leu Ser Tyr Pro Asp ThrAsp Val Ile Leu Met 145 150 155 160 tgt ttt tcc atc gac agc cct gat agttta gaa aac atc cca gaa aag 827 Cys Phe Ser Ile Asp Ser Pro Asp Ser LeuGlu Asn Ile Pro Glu Lys 165 170 175 tgg acc cca gaa gtc aag cat ttc tgtccc aac gtg ccc atc atc ctg 875 Trp Thr Pro Glu Val Lys His Phe Cys ProAsn Val Pro Ile Ile Leu 180 185 190 gtt ggg aat aag aag gat ctt cgg aatgat gag cac aca agg cgg gag 923 Val Gly Asn Lys Lys Asp Leu Arg Asn AspGlu His Thr Arg Arg Glu 195 200 205 cta gcc aag atg aag cag gag ccg gtgaaa cct gaa gaa ggc aga gat 971 Leu Ala Lys Met Lys Gln Glu Pro Val LysPro Glu Glu Gly Arg Asp 210 215 220 atg gca aac agg att ggc gct ttt gggtac atg gag tgt tca gca aag 1019 Met Ala Asn Arg Ile Gly Ala Phe Gly TyrMet Glu Cys Ser Ala Lys 225 230 235 240 acc aaa gat gga gtg aga gag gttttt gaa atg gct acg aga gct gct 1067 Thr Lys Asp Gly Val Arg Glu Val PheGlu Met Ala Thr Arg Ala Ala 245 250 255 ctg caa gct aga cgt ggg aag aaaaaa tct ggt tgc ctt gtc ttg gcc 1115 Leu Gln Ala Arg Arg Gly Lys Lys LysSer Gly Cys Leu Val Leu Ala 260 265 270 tcg agg cac cat cac cac cac cactgaaagcttt aatgcggtag tttatcacag 1169 Ser Arg His His His His His His275 280 ttaaattgct aacgcagtca ggcaccgtgt atgaaatcta acaatgcgctcatcgtcatc 1229 ctcggcaccg tcaccctgga tgctgtaggc ataggcttgg ttatgccggtactgccgggc 1289 ctcttgcggg atcgacgcga ggctggatgg ccttccccat tatgattcttctcgcttccg 1349 gcggcatcgg gatgcccgcg ttgcaggcca tgctgtccag gcaggtagatgacgaccatc 1409 agggacagct tcaaggatcg ctcgcggctc ttaccagcct aacttcgatcactggaccgc 1469 tgatcgtcac ggcgatttat gccgcctcgg cgagcacatg gaacgggttggcatggattg 1529 taggcgccgc cctatacctt gtctgcctcc ccgcgttgcg tcgcggtgcatggagccggg 1589 ccacctcgac ctgaatggaa gccggcggca cctcgctaac ggattcaccactccaagaat 1649 tggagccaat caattcttgc ggagaactgt gaatgcgcaa accaacccttggcagaacat 1709 atccatcgcg tccgccatct ccagcagccg cacgcggcgc atctcgggcagcgttgggtc 1769 ctggccacgg gtgcgcatga tcgtgctcct gtcgttgagg acccggctaggctggcgggg 1829 ttgccttact ggttagcaga atgaatcacc gatacgcgag cgaacgtgaagcgactgctg 1889 ctgcaaaacg tctgcgacct gagcaacaac atgaatggtc ttcggtttccgtgtttcgta 1949 aagtctggaa acgcggaagt cagcgccctg caccattatg ttccggatctgcatcgcagg 2009 atgctgctgg ctaccctgtg gaacacctac atctgtatta acgaagcgctggcattgacc 2069 ctgagtgatt tttctctggt cccgccgcat ccataccgcc agttgtttaccctcacaacg 2129 ttccagtaac cgggcatgtt catcatcagt aacccgtatc gtgagcatcctctctcgttt 2189 catcggtatc attaccccca tgaacagaaa ttccccctta cacggaggcatcaagtgacc 2249 aaacaggaaa aaaccgccct taacatggcc cgctttatca gaagccagacattaacgctt 2309 ctggagaaac tcaacgagct ggacgcggat gaacaggcag acatctgtgaatcgcttcac 2369 gaccacgctg atgagcttta ccgcagctgc ctcgcgcgtt tcggtgatgacggtgaaaac 2429 ctctgacaca tgcagctccc ggagacggtc acagcttgtc tgtaagcggatgccgggagc 2489 agacaagccc gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgcagccatgacc 2549 cagtcacgta gcgatagcgg agtgtatact ggcttaacta tgcggcatcagagcagattg 2609 tactgagagt gcaccatatg cggtgtgaaa taccgcacag atgcgtaaggagaaaatacc 2669 gcatcaggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtcgttcggctgc 2729 ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaatcaggggata 2789 acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgtaaaaaggccg 2849 cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaaaatcgacgct 2909 caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgtttccccctggaa 2969 gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctgtccgcctttc 3029 tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctcagttcggtgt 3089 aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagcccgaccgctgcg 3149 ccttatccgg taactatcgt cttgagtcca acccggtaag acacgacttatcgccactgg 3209 cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgctacagagttct 3269 tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatctgcgctctgc 3329 tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaacaaaccaccg 3389 ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaaaaaggatctc 3449 aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaaaactcacgtt 3509 aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatccttttaccccggt 3569 tgataatcag aaaagcccca aaaacaggaa gattgtataa gcaaatatttaaattgtaaa 3629 cgttaatatt ttgttaaaat tcgcgttaaa tttttgttaa atcagctcattttttaacca 3689 ataggccgaa atcggcaaaa tcccttataa atcaaaagaa tagcccgagatagggttgag 3749 tgttgttcca gtttggaaca agagtccact attaaagaac gtggactccaacgtcaaagg 3809 gcgaaaaacc gtctatcagg gcgatggccc actacgtgaa ccatcacccaaatcaagttt 3869 tttggggtcg aggtgccgta aagcactaaa tcggaaccct aaagggagcccccgatttag 3929 agcttgacgg ggaaagccgg cgaacgtggc gagaaaggaa gggaagaaagcgaaaggagc 3989 gggcgctagg gcgctggcaa gtgtagcggt cacgctgcgc gtaaccaccacacccgccgc 4049 gcttaatgcg ccgctacagg gcgcgtaaat caatctaaag tatatatgagtaaacttggt 4109 ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgtctatttcgtt 4169 catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggagggcttaccat 4229 ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctccagatttatcag 4289 caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaactttatccgcct 4349 ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgccagttaatagtt 4409 tgcgcaacgt tgttgccatt gctgcaggca tcgtggtgtc acgctcgtcgtttggtatgg 4469 cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatcccccatgttgtgca 4529 aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttggccgcagtgt 4589 tatcactcat ggttatggca gcactgcata attctcttac tgtcatgccatccgtaagat 4649 gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgtatgcggcgac 4709 cgagttgctc ttgcccggcg tcaacacggg ataataccgc gccacatagcagaactttaa 4769 aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatcttaccgctgt 4829 tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagcatcttttactt 4889 tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaaaagggaataa 4949 gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattattgaagcattt 5009 atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaaaataaacaaa 5069 taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtctaagaaaccattatta 5129 tcatgacatt aacctataaa aataggcgta tcacgaggcc ctttcgtcttcaagaattga 5189 tcgatcaa 5197 7 5688 DNA Artificial Sequence CDS(300)...(1631) Description of Artificial Sequence/note = syntheticconstruct 7 ttctcatgtt tgacagctta tctcatcgac tgcacggtgc accaatgcttctggcgtcag 60 gcagccatcg gaagctgtgg tatggctgtg caggtcgtaa atcactgcataattcgtgtc 120 gctcaaggcg cactcccgtt ctggataatg ttttttgcgc cgacatcataacggttctgg 180 caaatattct gaaatgagct gttgacaatt aatcatcggc tcgtataatgtggaattgtg 240 agcggataac aattaatgtg tgaatgtgag cggatacaat ttcacacaggaaacagcgt 299 atg agc aca aaa aag aaa cca tta aca caa gag cag ctt gaggac gca 347 Met Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu AspAla 1 5 10 15 cgt cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa cttggc tta 395 Arg Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu GlyLeu 20 25 30 tcc cag gaa tct gtc gca gac aag atg ggg atg ggg cag tca ggcgtt 443 Ser Gln Glu Ser Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val35 40 45 ggt gct tta ttt aat ggc atc aat gca tta aat gct tat aac gcg gca491 Gly Ala Leu Phe Asn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala 5055 60 ttg cta gca aaa att ctc aaa gtt agc gtt gaa gaa ttc gca gct cat539 Leu Leu Ala Lys Ile Leu Lys Val Ser Val Glu Glu Phe Ala Ala His 6570 75 80 atg aag gag acg cgg ggc gac gga ggg agc gcc ccc ttc tgc acc cgc587 Met Lys Glu Thr Arg Gly Asp Gly Gly Ser Ala Pro Phe Cys Thr Arg 8590 95 ctc aac cac tcg tat cca ggc atg tgg gcg ccc gag gca cgg ggc aac635 Leu Asn His Ser Tyr Pro Gly Met Trp Ala Pro Glu Ala Arg Gly Asn 100105 110 ctc aca cgc ccc cca ggg ccc ggc gag gac tgt ggc tcg gtg tcc gtg683 Leu Thr Arg Pro Pro Gly Pro Gly Glu Asp Cys Gly Ser Val Ser Val 115120 125 gcc ttc ccg atc acc atg ctg atc acc ggc ttc gtg ggc aac gcg ctg731 Ala Phe Pro Ile Thr Met Leu Ile Thr Gly Phe Val Gly Asn Ala Leu 130135 140 gcc atg ctg ctc gtg tcg cgt agc tac cgg cgt cgg gag agc aag cgc779 Ala Met Leu Leu Val Ser Arg Ser Tyr Arg Arg Arg Glu Ser Lys Arg 145150 155 160 aag aag tcg ttc ctg ttg tgc atc ggc tgg ctg gcg ctc act gacctg 827 Lys Lys Ser Phe Leu Leu Cys Ile Gly Trp Leu Ala Leu Thr Asp Leu165 170 175 gtc ggg cag ctg ctc aca agc ccc gtg gtc atc ttg gtg tac ctatcc 875 Val Gly Gln Leu Leu Thr Ser Pro Val Val Ile Leu Val Tyr Leu Ser180 185 190 aag cag cgc tgg gag cag ctc gac ccg tcg ggg cgc ctg tgc accttc 923 Lys Gln Arg Trp Glu Gln Leu Asp Pro Ser Gly Arg Leu Cys Thr Phe195 200 205 ttt ggt ctg acc atg act gtt ttc ggg ctg tcc tcg ctc ttc atcgcc 971 Phe Gly Leu Thr Met Thr Val Phe Gly Leu Ser Ser Leu Phe Ile Ala210 215 220 agc gcc atg gct gtc gag agg gcg ctg gcc atc cgt gcg cca cactgg 1019 Ser Ala Met Ala Val Glu Arg Ala Leu Ala Ile Arg Ala Pro His Trp225 230 235 240 tac gcg agc cac atg aag acg cgt gcc act cgc gcc gtc ctgctg ggc 1067 Tyr Ala Ser His Met Lys Thr Arg Ala Thr Arg Ala Val Leu LeuGly 245 250 255 gtg tgg ctg gca gtg ctc gcc ttc gcc ctg cta cct gtg ctgggt gtg 1115 Val Trp Leu Ala Val Leu Ala Phe Ala Leu Leu Pro Val Leu GlyVal 260 265 270 ggt cag tac acc atc cag tgg ccc ggg acg tgg tgc ttc atcagc acc 1163 Gly Gln Tyr Thr Ile Gln Trp Pro Gly Thr Trp Cys Phe Ile SerThr 275 280 285 gga cga ggg gac aac ggg acg agc tct tca cac aac tgg ggcaac ctt 1211 Gly Arg Gly Asp Asn Gly Thr Ser Ser Ser His Asn Trp Gly AsnLeu 290 295 300 ttc ttc gcc tcc acc ttt gcc ttc ctg ggc ctc ttg gcg ctggcc atc 1259 Phe Phe Ala Ser Thr Phe Ala Phe Leu Gly Leu Leu Ala Leu AlaIle 305 310 315 320 acc ttc acc tgc aac ctg gcc acc att aag gct ctg gtgtcc cgc tgc 1307 Thr Phe Thr Cys Asn Leu Ala Thr Ile Lys Ala Leu Val SerArg Cys 325 330 335 cgg gca aag gcg gca gca tca cag tcc agt gcc cag tggggc cgg atc 1355 Arg Ala Lys Ala Ala Ala Ser Gln Ser Ser Ala Gln Trp GlyArg Ile 340 345 350 acg acc gag acg gcc atc cag ctc atg ggg atc atg tgcgtg ctg tcg 1403 Thr Thr Glu Thr Ala Ile Gln Leu Met Gly Ile Met Cys ValLeu Ser 355 360 365 gtc tgc tgg tcg ccc cta ctg ata atg atg ttg aaa atgatc ttc aat 1451 Val Cys Trp Ser Pro Leu Leu Ile Met Met Leu Lys Met IlePhe Asn 370 375 380 cag aca tca gtt gag cac tgc aag aca gac aca gga aagcag aaa gaa 1499 Gln Thr Ser Val Glu His Cys Lys Thr Asp Thr Gly Lys GlnLys Glu 385 390 395 400 tgc aac ttc ttc tta ata gct gtt cgc ctg gct tcactg aac cag ata 1547 Cys Asn Phe Phe Leu Ile Ala Val Arg Leu Ala Ser LeuAsn Gln Ile 405 410 415 ttg gat ccc tgg gtt tat ctg ctg cta aga aag attctt ctt cgg aag 1595 Leu Asp Pro Trp Val Tyr Leu Leu Leu Arg Lys Ile LeuLeu Arg Lys 420 425 430 ttt tgc cag gcc tcg agg cac cat cac cac cac cactgaagcttta 1641 Phe Cys Gln Ala Ser Arg His His His His His His 435 440atgcggtagt ttatcacagt taaattgcta acgcagtcag gcaccgtgta tgaaatctaa 1701caatgcgctc atcgtcatcc tcggcaccgt caccctggat gctgtaggca taggcttggt 1761tatgccggta ctgccgggcc tcttgcggga tcgacgcgag gctggatggc cttccccatt 1821atgattcttc tcgcttccgg cggcatcggg atgcccgcgt tgcaggccat gctgtccagg 1881caggtagatg acgaccatca gggacagctt caaggatcgc tcgcggctct taccagccta 1941acttcgatca ctggaccgct gatcgtcacg gcgatttatg ccgcctcggc gagcacatgg 2001aacgggttgg catggattgt aggcgccgcc ctataccttg tctgcctccc cgcgttgcgt 2061cgcggtgcat ggagccgggc cacctcgacc tgaatggaag ccggcggcac ctcgctaacg 2121gattcaccac tccaagaatt ggagccaatc aattcttgcg gagaactgtg aatgcgcaaa 2181ccaacccttg gcagaacata tccatcgcgt ccgccatctc cagcagccgc acgcggcgca 2241tctcgggcag cgttgggtcc tggccacggg tgcgcatgat cgtgctcctg tcgttgagga 2301cccggctagg ctggcggggt tgccttactg gttagcagaa tgaatcaccg atacgcgagc 2361gaacgtgaag cgactgctgc tgcaaaacgt ctgcgacctg agcaacaaca tgaatggtct 2421tcggtttccg tgtttcgtaa agtctggaaa cgcggaagtc agcgccctgc accattatgt 2481tccggatctg catcgcagga tgctgctggc taccctgtgg aacacctaca tctgtattaa 2541cgaagcgctg gcattgaccc tgagtgattt ttctctggtc ccgccgcatc cataccgcca 2601gttgtttacc ctcacaacgt tccagtaacc gggcatgttc atcatcagta acccgtatcg 2661tgagcatcct ctctcgtttc atcggtatca ttacccccat gaacagaaat tcccccttac 2721acggaggcat caagtgacca aacaggaaaa aaccgccctt aacatggccc gctttatcag 2781aagccagaca ttaacgcttc tggagaaact caacgagctg gacgcggatg aacaggcaga 2841catctgtgaa tcgcttcacg accacgctga tgagctttac cgcagctgcc tcgcgcgttt 2901cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca cagcttgtct 2961gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg ttggcgggtg 3021tcggggcgca gccatgaccc agtcacgtag cgatagcgga gtgtatactg gcttaactat 3081gcggcatcag agcagattgt actgagagtg caccatatgc ggtgtgaaat accgcacaga 3141tgcgtaagga gaaaataccg catcaggcgc tcttccgctt cctcgctcac tgactcgctg 3201cgctcggtcg ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta 3261tccacagaat caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc 3321aggaaccgta aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag 3381catcacaaaa atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac 3441caggcgtttc cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc 3501ggatacctgt ccgcctttct cccttcggga agcgtggcgc tttctcatag ctcacgctgt 3561aggtatctca gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc 3621gttcagcccg accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga 3681cacgacttat cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta 3741ggcggtgcta cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta 3801tttggtatct gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga 3861tccggcaaac aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg 3921cgcagaaaaa aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag 3981tggaacgaaa actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc 4041tagatccttt taccccggtt gataatcaga aaagccccaa aaacaggaag attgtataag 4101caaatattta aattgtaaac gttaatattt tgttaaaatt cgcgttaaat ttttgttaaa 4161tcagctcatt ttttaaccaa taggccgaaa tcggcaaaat cccttataaa tcaaaagaat 4221agcccgagat agggttgagt gttgttccag tttggaacaa gagtccacta ttaaagaacg 4281tggactccaa cgtcaaaggg cgaaaaaccg tctatcaggg cgatggccca ctacgtgaac 4341catcacccaa atcaagtttt ttggggtcga ggtgccgtaa agcactaaat cggaacccta 4401aagggagccc ccgatttaga gcttgacggg gaaagccggc gaacgtggcg agaaaggaag 4461ggaagaaagc gaaaggagcg ggcgctaggg cgctggcaag tgtagcggtc acgctgcgcg 4521taaccaccac acccgccgcg cttaatgcgc cgctacaggg cgcgtaaatc aatctaaagt 4581atatatgagt aaacttggtc tgacagttac caatgcttaa tcagtgaggc acctatctca 4641gcgatctgtc tatttcgttc atccatagtt gcctgactcc ccgtcgtgta gataactacg 4701atacgggagg gcttaccatc tggccccagt gctgcaatga taccgcgaga cccacgctca 4761ccggctccag atttatcagc aataaaccag ccagccggaa gggccgagcg cagaagtggt 4821cctgcaactt tatccgcctc catccagtct attaattgtt gccgggaagc tagagtaagt 4881agttcgccag ttaatagttt gcgcaacgtt gttgccattg ctgcaggcat cgtggtgtca 4941cgctcgtcgt ttggtatggc ttcattcagc tccggttccc aacgatcaag gcgagttaca 5001tgatccccca tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat cgttgtcaga 5061agtaagttgg ccgcagtgtt atcactcatg gttatggcag cactgcataa ttctcttact 5121gtcatgccat ccgtaagatg cttttctgtg actggtgagt actcaaccaa gtcattctga 5181gaatagtgta tgcggcgacc gagttgctct tgcccggcgt caacacggga taataccgcg 5241ccacatagca gaactttaaa agtgctcatc attggaaaac gttcttcggg gcgaaaactc 5301tcaaggatct taccgctgtt gagatccagt tcgatgtaac ccactcgtgc acccaactga 5361tcttcagcat cttttacttt caccagcgtt tctgggtgag caaaaacagg aaggcaaaat 5421gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa tactcatact cttccttttt 5481caatattatt gaagcattta tcagggttat tgtctcatga gcggatacat atttgaatgt 5541atttagaaaa ataaacaaat aggggttccg cgcacatttc cccgaaaagt gccacctgac 5601gtctaagaaa ccattattat catgacatta acctataaaa ataggcgtat cacgaggccc 5661tttcgtcttc aagaattgat cgatcaa 5688 8 6587 DNA Artificial Sequence CDS(300)...(2267) Description of Artificial Sequence/note = syntheticconstruct 8 ttctcatgtt tgacagctta tctcatcgac tgcacggtgc accaatgcttctggcgtcag 60 gcagccatcg gaagctgtgg tatggctgtg caggtcgtaa atcactgcataattcgtgtc 120 gctcaaggcg cactcccgtt ctggataatg ttttttgcgc cgacatcataacggttctgg 180 caaatattct gaaatgagct gttgacaatt aatcatcggc tcgtataatgtggaattgtg 240 agcggataac aattaatgtg tgaatgtgag cggatacaat ttcacacaggaaacagcgt 299 atg agc aca aaa aag aaa cca tta aca caa gag cag ctt gaggac gca 347 Met Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu AspAla 1 5 10 15 cgt cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa cttggc tta 395 Arg Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu GlyLeu 20 25 30 tcc cag gaa tct gtc gca gac aag atg ggg atg ggg cag tca ggcgtt 443 Ser Gln Glu Ser Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val35 40 45 ggt gct tta ttt aat ggc atc aat gca tta aat gct tat aac gcg gca491 Gly Ala Leu Phe Asn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala 5055 60 ttg cta gca aaa att ctc aaa gtt agc gtt gaa gaa ttc gca gct cat539 Leu Leu Ala Lys Ile Leu Lys Val Ser Val Glu Glu Phe Ala Ala His 6570 75 80 atg aag gag acg cgg ggc gac gga ggg agc gcc ccc ttc tgc acc cgc587 Met Lys Glu Thr Arg Gly Asp Gly Gly Ser Ala Pro Phe Cys Thr Arg 8590 95 ctc aac cac tcg tat cca ggc atg tgg gcg ccc gag gca cgg ggc aac635 Leu Asn His Ser Tyr Pro Gly Met Trp Ala Pro Glu Ala Arg Gly Asn 100105 110 ctc aca cgc ccc cca ggg ccc ggc gag gac tgt ggc tcg gtg tcc gtg683 Leu Thr Arg Pro Pro Gly Pro Gly Glu Asp Cys Gly Ser Val Ser Val 115120 125 gcc ttc ccg atc acc atg ctg atc acc ggc ttc gtg ggc aac gcg ctg731 Ala Phe Pro Ile Thr Met Leu Ile Thr Gly Phe Val Gly Asn Ala Leu 130135 140 gcc atg ctg ctc gtg tcg cgt agc tac cgg cgt cgg gag agc aag cgc779 Ala Met Leu Leu Val Ser Arg Ser Tyr Arg Arg Arg Glu Ser Lys Arg 145150 155 160 aag aag tcg ttc ctg ttg tgc atc ggc tgg ctg gcg ctc act gacctg 827 Lys Lys Ser Phe Leu Leu Cys Ile Gly Trp Leu Ala Leu Thr Asp Leu165 170 175 gtc ggg cag ctg ctc aca agc ccc gtg gtc atc ttg gtg tac ctatcc 875 Val Gly Gln Leu Leu Thr Ser Pro Val Val Ile Leu Val Tyr Leu Ser180 185 190 aag cag cgc tgg gag cag ctc gac ccg tcg ggg cgc ctg tgc accttc 923 Lys Gln Arg Trp Glu Gln Leu Asp Pro Ser Gly Arg Leu Cys Thr Phe195 200 205 ttt ggt ctg acc atg act gtt ttc ggg ctg tcc tcg ctc ttc atcgcc 971 Phe Gly Leu Thr Met Thr Val Phe Gly Leu Ser Ser Leu Phe Ile Ala210 215 220 agc gcc atg gct gtc gag agg gcg ctg gcc atc cgt gcg cca cactgg 1019 Ser Ala Met Ala Val Glu Arg Ala Leu Ala Ile Arg Ala Pro His Trp225 230 235 240 tac gcg agc cac atg aag acg cgt gcc act cgc gcc gtc ctgctg ggc 1067 Tyr Ala Ser His Met Lys Thr Arg Ala Thr Arg Ala Val Leu LeuGly 245 250 255 gtg tgg ctg gca gtg ctc gcc ttc gcc ctg cta cct gtg ctgggt gtg 1115 Val Trp Leu Ala Val Leu Ala Phe Ala Leu Leu Pro Val Leu GlyVal 260 265 270 ggt cag tac acc atc cag tgg ccc ggg acg tgg tgc ttc atcagc acc 1163 Gly Gln Tyr Thr Ile Gln Trp Pro Gly Thr Trp Cys Phe Ile SerThr 275 280 285 gga cga ggg gac aac ggg acg agc tct tca cac aac tgg ggcaac ctt 1211 Gly Arg Gly Asp Asn Gly Thr Ser Ser Ser His Asn Trp Gly AsnLeu 290 295 300 ttc ttc gcc tcc acc ttt gcc ttc ctg ggc ctc ttg gcg ctggcc atc 1259 Phe Phe Ala Ser Thr Phe Ala Phe Leu Gly Leu Leu Ala Leu AlaIle 305 310 315 320 acc ttc acc tgc aac ctg gcc acc att aag gct ctg gtgtcc cgc tgc 1307 Thr Phe Thr Cys Asn Leu Ala Thr Ile Lys Ala Leu Val SerArg Cys 325 330 335 cgg gca aag gcg gca gca tca cag tcc agt gcc cag tggggc cgg atc 1355 Arg Ala Lys Ala Ala Ala Ser Gln Ser Ser Ala Gln Trp GlyArg Ile 340 345 350 acg acc gag acg gcc atc cag ctc atg ggg atc atg tgcgtg ctg tcg 1403 Thr Thr Glu Thr Ala Ile Gln Leu Met Gly Ile Met Cys ValLeu Ser 355 360 365 gtc tgc tgg tcg ccc cta ctg ata atg atg ttg aaa atgatc ttc aat 1451 Val Cys Trp Ser Pro Leu Leu Ile Met Met Leu Lys Met IlePhe Asn 370 375 380 cag aca tca gtt gag cac tgc aag aca gac aca gga aagcag aaa gaa 1499 Gln Thr Ser Val Glu His Cys Lys Thr Asp Thr Gly Lys GlnLys Glu 385 390 395 400 tgc aac ttc ttc tta ata gct gtt cgc ctg gct tcactg aac cag ata 1547 Cys Asn Phe Phe Leu Ile Ala Val Arg Leu Ala Ser LeuAsn Gln Ile 405 410 415 ttg gat ccc tgg gtt tat ctg ctg cta aga aag attctt ctt cgg aag 1595 Leu Asp Pro Trp Val Tyr Leu Leu Leu Arg Lys Ile LeuLeu Arg Lys 420 425 430 ttt tgc cag gta att cat gaa aat aat gag cag aaggat gaa att cag 1643 Phe Cys Gln Val Ile His Glu Asn Asn Glu Gln Lys AspGlu Ile Gln 435 440 445 cgt gag aac agg aac gtc tca cac agt ggg caa cacgaa gag gcc aga 1691 Arg Glu Asn Arg Asn Val Ser His Ser Gly Gln His GluGlu Ala Arg 450 455 460 gac agt gag aag agc aaa acc atc cct ggc ctg ttctcc att ctg ctg 1739 Asp Ser Glu Lys Ser Lys Thr Ile Pro Gly Leu Phe SerIle Leu Leu 465 470 475 480 cag gct gac cct ggt gct cgt cct tat cag caagcc tcg agc ctg gtg 1787 Gln Ala Asp Pro Gly Ala Arg Pro Tyr Gln Gln AlaSer Ser Leu Val 485 490 495 cca cgc gga tcc gtt cga gaa atc tac gag atgtat gaa gcg gtt agc 1835 Pro Arg Gly Ser Val Arg Glu Ile Tyr Glu Met TyrGlu Ala Val Ser 500 505 510 atg cag ccg tca ctt aga agt gag tat gag taccct gtt ttt tct cat 1883 Met Gln Pro Ser Leu Arg Ser Glu Tyr Glu Tyr ProVal Phe Ser His 515 520 525 gtt cag gca ggg atg ttc tca cct aag ctt agaacc ttt acc aaa ggt 1931 Val Gln Ala Gly Met Phe Ser Pro Lys Leu Arg ThrPhe Thr Lys Gly 530 535 540 gat gcg gag aga tgg gta agc aca acc aaa aaagcc agt gat tct gca 1979 Asp Ala Glu Arg Trp Val Ser Thr Thr Lys Lys AlaSer Asp Ser Ala 545 550 555 560 ttc tgg ctt gag gtt gaa ggt aat tcc atgacc gca cca aca ggc tcc 2027 Phe Trp Leu Glu Val Glu Gly Asn Ser Met ThrAla Pro Thr Gly Ser 565 570 575 aag cca agc ttt cct gac gga atg tta attctc gtt gac cct gag cag 2075 Lys Pro Ser Phe Pro Asp Gly Met Leu Ile LeuVal Asp Pro Glu Gln 580 585 590 gct gtt gag cca ggt gat ttc tgc ata gccaga ctt ggg ggt gat gag 2123 Ala Val Glu Pro Gly Asp Phe Cys Ile Ala ArgLeu Gly Gly Asp Glu 595 600 605 ttt acc ttc aag aaa ctg atc agg gat agcggt cag gtg ttt tta caa 2171 Phe Thr Phe Lys Lys Leu Ile Arg Asp Ser GlyGln Val Phe Leu Gln 610 615 620 cca cta aac cca cag tac cca atg atc ccatgc aat gag agt tgt tcc 2219 Pro Leu Asn Pro Gln Tyr Pro Met Ile Pro CysAsn Glu Ser Cys Ser 625 630 635 640 gtt gtg ggg aaa gtt atc gct agt cagtgg cct gaa gag acg ttt ggc 2267 Val Val Gly Lys Val Ile Ala Ser Gln TrpPro Glu Glu Thr Phe Gly 645 650 655 tgatcggcaa ggtgttctgg tcggcgcatagctgataaca attgagcaag aatcttcatc 2327 gaattagggg aattttcact cccctcagaacataacatag taaatggatt gaattatgaa 2387 gaatggtttt tatgcgactt accgcagcaaaaataaaggg aaagataagc gctcaataaa 2447 cctgtctgtt ttccttaatt ctctgctggctgataatcat cacctgcagg ttggctccaa 2507 ttatttgtat attcataaaa tcgataagctttaatgcggt agtttatcac agttaaattg 2567 ctaacgcagt caggcaccgt gtatgaaatctaacaatgcg ctcatcgtca tcctcggcac 2627 cgtcaccctg gatgctgtag gcataggcttggttatgccg gtactgccgg gcctcttgcg 2687 ggatcgacgc gaggctggat ggccttccccattatgattc ttctcgcttc cggcggcatc 2747 gggatgcccg cgttgcaggc catgctgtccaggcaggtag atgacgacca tcagggacag 2807 cttcaaggat cgctcgcggc tcttaccagcctaacttcga tcactggacc gctgatcgtc 2867 acggcgattt atgccgcctc ggcgagcacatggaacgggt tggcatggat tgtaggcgcc 2927 gccctatacc ttgtctgcct ccccgcgttgcgtcgcggtg catggagccg ggccacctcg 2987 acctgaatgg aagccggcgg cacctcgctaacggattcac cactccaaga attggagcca 3047 atcaattctt gcggagaact gtgaatgcgcaaaccaaccc ttggcagaac atatccatcg 3107 cgtccgccat ctccagcagc cgcacgcggcgcatctcggg cagcgttggg tcctggccac 3167 gggtgcgcat gatcgtgctc ctgtcgttgaggacccggct aggctggcgg ggttgcctta 3227 ctggttagca gaatgaatca ccgatacgcgagcgaacgtg aagcgactgc tgctgcaaaa 3287 cgtctgcgac ctgagcaaca acatgaatggtcttcggttt ccgtgtttcg taaagtctgg 3347 aaacgcggaa gtcagcgccc tgcaccattatgttccggat ctgcatcgca ggatgctgct 3407 ggctaccctg tggaacacct acatctgtattaacgaagcg ctggcattga ccctgagtga 3467 tttttctctg gtcccgccgc atccataccgccagttgttt accctcacaa cgttccagta 3527 accgggcatg ttcatcatca gtaacccgtatcgtgagcat cctctctcgt ttcatcggta 3587 tcattacccc catgaacaga aattcccccttacacggagg catcaagtga ccaaacagga 3647 aaaaaccgcc cttaacatgg cccgctttatcagaagccag acattaacgc ttctggagaa 3707 actcaacgag ctggacgcgg atgaacaggcagacatctgt gaatcgcttc acgaccacgc 3767 tgatgagctt taccgcagct gcctcgcgcgtttcggtgat gacggtgaaa acctctgaca 3827 catgcagctc ccggagacgg tcacagcttgtctgtaagcg gatgccggga gcagacaagc 3887 ccgtcagggc gcgtcagcgg gtgttggcgggtgtcggggc gcagccatga cccagtcacg 3947 tagcgatagc ggagtgtata ctggcttaactatgcggcat cagagcagat tgtactgaga 4007 gtgcaccata tgcggtgtga aataccgcacagatgcgtaa ggagaaaata ccgcatcagg 4067 cgctcttccg cttcctcgct cactgactcgctgcgctcgg tcgttcggct gcggcgagcg 4127 gtatcagctc actcaaaggc ggtaatacggttatccacag aatcagggga taacgcagga 4187 aagaacatgt gagcaaaagg ccagcaaaaggccaggaacc gtaaaaaggc cgcgttgctg 4247 gcgtttttcc ataggctccg cccccctgacgagcatcaca aaaatcgacg ctcaagtcag 4307 aggtggcgaa acccgacagg actataaagataccaggcgt ttccccctgg aagctccctc 4367 gtgcgctctc ctgttccgac cctgccgcttaccggatacc tgtccgcctt tctcccttcg 4427 ggaagcgtgg cgctttctca tagctcacgctgtaggtatc tcagttcggt gtaggtcgtt 4487 cgctccaagc tgggctgtgt gcacgaaccccccgttcagc ccgaccgctg cgccttatcc 4547 ggtaactatc gtcttgagtc caacccggtaagacacgact tatcgccact ggcagcagcc 4607 actggtaaca ggattagcag agcgaggtatgtaggcggtg ctacagagtt cttgaagtgg 4667 tggcctaact acggctacac tagaaggacagtatttggta tctgcgctct gctgaagcca 4727 gttaccttcg gaaaaagagt tggtagctcttgatccggca aacaaaccac cgctggtagc 4787 ggtggttttt ttgtttgcaa gcagcagattacgcgcagaa aaaaaggatc tcaagaagat 4847 cctttgatct tttctacggg gtctgacgctcagtggaacg aaaactcacg ttaagggatt 4907 ttggtcatga gattatcaaa aaggatcttcacctagatcc ttttaccccg gttgataatc 4967 agaaaagccc caaaaacagg aagattgtataagcaaatat ttaaattgta aacgttaata 5027 ttttgttaaa attcgcgtta aatttttgttaaatcagctc attttttaac caataggccg 5087 aaatcggcaa aatcccttat aaatcaaaagaatagcccga gatagggttg agtgttgttc 5147 cagtttggaa caagagtcca ctattaaagaacgtggactc caacgtcaaa gggcgaaaaa 5207 ccgtctatca gggcgatggc ccactacgtgaaccatcacc caaatcaagt tttttggggt 5267 cgaggtgccg taaagcacta aatcggaaccctaaagggag cccccgattt agagcttgac 5327 ggggaaagcc ggcgaacgtg gcgagaaaggaagggaagaa agcgaaagga gcgggcgcta 5387 gggcgctggc aagtgtagcg gtcacgctgcgcgtaaccac cacacccgcc gcgcttaatg 5447 cgccgctaca gggcgcgtaa atcaatctaaagtatatatg agtaaacttg gtctgacagt 5507 taccaatgct taatcagtga ggcacctatctcagcgatct gtctatttcg ttcatccata 5567 gttgcctgac tccccgtcgt gtagataactacgatacggg agggcttacc atctggcccc 5627 agtgctgcaa tgataccgcg agacccacgctcaccggctc cagatttatc agcaataaac 5687 cagccagccg gaagggccga gcgcagaagtggtcctgcaa ctttatccgc ctccatccag 5747 tctattaatt gttgccggga agctagagtaagtagttcgc cagttaatag tttgcgcaac 5807 gttgttgcca ttgctgcagg catcgtggtgtcacgctcgt cgtttggtat ggcttcattc 5867 agctccggtt cccaacgatc aaggcgagttacatgatccc ccatgttgtg caaaaaagcg 5927 gttagctcct tcggtcctcc gatcgttgtcagaagtaagt tggccgcagt gttatcactc 5987 atggttatgg cagcactgca taattctcttactgtcatgc catccgtaag atgcttttct 6047 gtgactggtg agtactcaac caagtcattctgagaatagt gtatgcggcg accgagttgc 6107 tcttgcccgg cgtcaacacg ggataataccgcgccacata gcagaacttt aaaagtgctc 6167 atcattggaa aacgttcttc ggggcgaaaactctcaagga tcttaccgct gttgagatcc 6227 agttcgatgt aacccactcg tgcacccaactgatcttcag catcttttac tttcaccagc 6287 gtttctgggt gagcaaaaac aggaaggcaaaatgccgcaa aaaagggaat aagggcgaca 6347 cggaaatgtt gaatactcat actcttcctttttcaatatt attgaagcat ttatcagggt 6407 tattgtctca tgagcggata catatttgaatgtatttaga aaaataaaca aataggggtt 6467 ccgcgcacat ttccccgaaa agtgccacctgacgtctaag aaaccattat tatcatgaca 6527 ttaacctata aaaataggcg tatcacgaggccctttcgtc ttcaagaatt gatcgatcaa 6587 9 711 DNA Artificial Sequence CDS(1)...(708) Description of Artificial Sequence/note = syntheticconstruct 9 agc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gcacgt 48 Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala Arg 15 10 15 cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta tcc96 Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu Ser 20 2530 cag gaa tct gtc gca gac aag atg ggg atg ggg cag tca ggc gtt ggt 144Gln Glu Ser Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val Gly 35 40 45gct tta ttt aat ggc atc aat gca tta aat gct tat aac gcg gca ttg 192 AlaLeu Phe Asn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala Leu 50 55 60 ctagca aaa att ctc aaa gtt agc gtt gaa gaa ttc agc cct tca atc 240 Leu AlaLys Ile Leu Lys Val Ser Val Glu Glu Phe Ser Pro Ser Ile 65 70 75 80 gctcga gaa atc tac gag atg tat gaa gcg gtt agc atg cag ccg tca 288 Ala ArgGlu Ile Tyr Glu Met Tyr Glu Ala Val Ser Met Gln Pro Ser 85 90 95 ctt agaagt gag tat gag tac cct gtt ttt tct cat gtt cag gca ggg 336 Leu Arg SerGlu Tyr Glu Tyr Pro Val Phe Ser His Val Gln Ala Gly 100 105 110 atg ttctca cct aag ctt aga acc ttt acc aaa ggt gat gcg gag aga 384 Met Phe SerPro Lys Leu Arg Thr Phe Thr Lys Gly Asp Ala Glu Arg 115 120 125 tgg gtaagc aca acc aaa aaa gcc agt gat tct gca ttc tgg ctt gag 432 Trp Val SerThr Thr Lys Lys Ala Ser Asp Ser Ala Phe Trp Leu Glu 130 135 140 gtt gaaggt aat tcc atg acc gca cca aca ggc tcc aag cca agc ttt 480 Val Glu GlyAsn Ser Met Thr Ala Pro Thr Gly Ser Lys Pro Ser Phe 145 150 155 160 cctgac gga atg tta att ctc gtt gac cct gag cag gct gtt gag cca 528 Pro AspGly Met Leu Ile Leu Val Asp Pro Glu Gln Ala Val Glu Pro 165 170 175 ggtgat ttc tgc ata gcc aga ctt ggg ggt gat gag ttt acc ttc aag 576 Gly AspPhe Cys Ile Ala Arg Leu Gly Gly Asp Glu Phe Thr Phe Lys 180 185 190 aaactg atc agg gat agc ggt cag gtg ttt tta caa cca cta aac cca 624 Lys LeuIle Arg Asp Ser Gly Gln Val Phe Leu Gln Pro Leu Asn Pro 195 200 205 cagtac cca atg atc cca tgc aat gag agt tgt tcc gtt gtg ggg aaa 672 Gln TyrPro Met Ile Pro Cys Asn Glu Ser Cys Ser Val Val Gly Lys 210 215 220 gttatc gct agt cag tgg cct gaa gag acg ttt ggc tga 711 Val Ile Ala Ser GlnTrp Pro Glu Glu Thr Phe Gly 225 230 235 10 276 DNA Artificial SequenceCDS (1)...(276) Description of Artificial Sequence/note = syntheticconstruct 10 agc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gcacgt 48 Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala Arg 15 10 15 cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta tcc96 Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu Ser 20 2530 cag gaa tct gtc gca gac aag atg ggg atg ggg cag tca ggc gtt ggt 144Gln Glu Ser Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val Gly 35 40 45gct tta ttt aat ggc atc aat gca tta aat gct tat aac gcg gca ttg 192 AlaLeu Phe Asn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala Leu 50 55 60 ctagca aaa att ctc aaa gtt agc gtt gaa gaa ttc agc cct tca atc 240 Leu AlaLys Ile Leu Lys Val Ser Val Glu Glu Phe Ser Pro Ser Ile 65 70 75 80 gctcga gaa atc tac gag atg tat gaa gcg gtt agc 276 Ala Arg Glu Ile Tyr GluMet Tyr Glu Ala Val Ser 85 90 11 228 DNA Artificial Sequence CDS(1)...(228) Description of Artificial Sequence/note = syntheticconstruct 11 agc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gcacgt 48 Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala Arg 15 10 15 cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta tcc96 Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu Ser 20 2530 cag gaa tct gtc gca gac aag atg ggg atg ggg cag tca ggc gtt ggt 144Gln Glu Ser Val Ala Asp Lys Met Gly Met Gly Gln Ser Gly Val Gly 35 40 45gct tta ttt aat ggc atc aat gca tta aat gct tat aac gcg gca ttg 192 AlaLeu Phe Asn Gly Ile Asn Ala Leu Asn Ala Tyr Asn Ala Ala Leu 50 55 60 ctagca aaa att ctc aaa gtt agc gtt gaa gaa ttc 228 Leu Ala Lys Ile Leu LysVal Ser Val Glu Glu Phe 65 70 75 12 108 DNA Artificial Sequence CDS(1)...(108) Description of Artificial Sequence/note = syntheticconstruct 12 agc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gcacgt 48 Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala Arg 15 10 15 cgc ctt aaa gca att tat gaa aaa aag aaa aat gaa ctt ggc tta tcc96 Arg Leu Lys Ala Ile Tyr Glu Lys Lys Lys Asn Glu Leu Gly Leu Ser 20 2530 cag gaa tct gtc 108 Gln Glu Ser Val 35 13 66 DNA Artificial SequenceCDS (1)...(66) Description of Artificial Sequence/note = syntheticconstruct 13 agc aca aaa aag aaa cca tta aca caa gag cag ctt gag gac gcacgt 48 Ser Thr Lys Lys Lys Pro Leu Thr Gln Glu Gln Leu Glu Asp Ala Arg 15 10 15 cgc ctt aaa gca att tat 66 Arg Leu Lys Ala Ile Tyr 20 14 45 DNAArtificial Sequence CDS (1)...(45) Description of ArtificialSequence/note = synthetic construct 14 agc aca aaa aag aaa cca tta acacaa gag cag ctt gag gac gca 45 Ser Thr Lys Lys Lys Pro Leu Thr Gln GluGln Leu Glu Asp Ala 1 5 10 15 15 6 PRT Artificial Sequence Descriptionof Artificial Sequence/note = synthetic construct 15 Leu Val Pro Arg GlySer 1 5 16 13 DNA Artificial Sequence Description of ArtificialSequence/note = synthetic construct 16 aattcgcagc tca 13 17 11 DNAArtificial Sequence Description of Artificial Sequence/note = syntheticconstruct 17 tatgagctgc g 11 18 18 DNA Artificial Sequence Descriptionof Artificial Sequence/note = synthetic construct 18 acatcagttg agcactgc18 19 27 DNA Artificial Sequence Description of Artificial Sequence/note= synthetic construct 19 cctcgaggct tgctgataag gacgagc 27 20 28 DNAArtificial Sequence Description of Artificial Sequence/note = syntheticconstruct 20 tcgaggcacc atcaccacca ccactgaa 28 21 28 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct21 agctttcagt ggtggtggtg atggtgcc 28 22 18 DNA Artificial SequenceDescription of Artificial Sequence/note = synthetic construct 22tggctggcag tgctcgcc 18 23 30 DNA Artificial Sequence Description ofArtificial Sequence/note = synthetic construct 23 tcacctcgag gcctggcaaaacttccgaag 30 24 26 DNA Artificial Sequence Description of ArtificialSequence/note = synthetic construct 24 tcgaacggat ccgcgtggca ccaggc 2625 18 DNA Artificial Sequence Description of Artificial Sequence/note =synthetic construct 25 agcgctacct ctcgatcg 18 26 32 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct26 gccgcactcg aggcaaggtc agcctgttta ct 32 27 40 DNA Artificial SequenceDescription of Artificial Sequence/note = synthetic construct 27tcgagccacc accaccacca ctctagactg gtgccacgcg 40 28 43 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct28 gatccgcgtg gcaccagtct agagtggtgg tggtggtggt ggc 43 29 31 DNAArtificial Sequence Description of Artificial Sequence/note = syntheticconstruct 29 gcgccatatg gattataagt gtcaagtcca a 31 30 32 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct30 gccgctcgag gccaagccca cagatatttc ct 32 31 39 DNA Artificial SequenceDescription of Artificial Sequence/note = synthetic construct 31gcgcgaattc accatggaaa tgagacctgc tgtgacttc 39 32 39 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct32 ccgggctcga ggctagcagt gagtcatttg tactacaat 39 33 35 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct33 gggaattcca tatgttcaaa cacctccgaa gatgg 35 34 35 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct34 ccgctcgagg ccatctgggt gtcgtccgtt tcatc 35 35 26 DNA ArtificialSequence Description of Artificial Sequence/note = synthetic construct35 gcgcgcatat ggctgccatc cggaag 26 36 30 DNA Artificial SequenceDescription of Artificial Sequence/note = synthetic construct 36gccgctcgag gccaagacaa ggcaaccaga 30 37 26 DNA Artificial SequenceDescription of Artificial Sequence/note = synthetic construct 37tcgagcctgg tgccacgcgg atccgt 26

What is claimed is:
 1. An isolated nucleic acid comprising a firstnucleotide sequence encoding an amino acid sequence comprising at leastthree positively charged amino acid residues, positioned upstream and inframe with a second nucleotide sequence encoding a protein.
 2. Anisolated nucleic acid comprising a first nucleotide sequence encoding aDNA binding protein, positioned upstream and in frame with a secondnucleotide sequence encoding a protein.
 3. The nucleic acid of claim 2,wherein the DNA binding protein is selected from the group consisting ofeukaryotic DNA binding proteins, prokaryotic DNA binding proteins andbacteriophage-derived DNA binding proteins.
 4. An isolated nucleic acidcomprising a first nucleotide sequence encoding a bacteriophage lambdarepressor protein, positioned upstream and in frame with a secondnucleotide sequence encoding a protein.
 5. The nucleic acid of claim 4,wherein the first nucleotide sequence encodes the N-terminal domain ofthe bacteriophage lambda repressor protein.
 6. The nucleic acid of claim4, wherein the first nucleotide sequence encodes amino acids 1-76 of thebacteriophage lambda repressor protein.
 7. The nucleic acid of claim 4,wherein the first nucleotide sequence encodes at least 15 contiguousamino acids of the N-terminus of the bacteriophage lambda repressorprotein.
 8. The nucleic acid of claim 1, wherein the second nucleotidesequence encodes a protein selected from the group consisting ofintegral membrane proteins, G-protein coupled receptor proteins and ionchannel proteins.
 9. The nucleic acid of claim 2, wherein the secondnucleotide sequence encodes a protein selected from the group consistingof integral membrane proteins, G-protein coupled receptor proteins andion channel proteins.
 10. The nucleic acid of claim 3, wherein thesecond nucleotide sequence encodes a protein selected from the groupconsisting of integral membrane proteins, G-protein coupled receptorproteins and ion channel proteins.
 11. The nucleic acid of claim 4,wherein the second nucleotide sequence encodes a protein selected fromthe group consisting of integral membrane proteins, G-protein coupledreceptor proteins and ion channel proteins.
 12. The nucleic acid ofclaim 5, wherein the second nucleotide sequence encodes a proteinselected from the group consisting of integral membrane proteins,G-protein coupled receptor proteins and ion channel proteins.
 13. Thenucleic acid of claim 6, wherein the second nucleotide sequence encodesa protein selected from the group consisting of integral membraneproteins, G-protein coupled receptor proteins and ion channel proteins.14. The nucleic acid of claim 7, wherein the second nucleotide sequenceencodes a protein selected from the group consisting of integralmembrane proteins, G-protein coupled receptor proteins and ion channelproteins.
 15. The nucleic acid of claim 1, wherein the second nucleotidesequence encodes a protein selected from the group consisting of rabbitprostaglandin E₂EP₃ receptor protein, human prostaglandin E₂EP₂ receptorprotein, human chemokine receptor CCR-5 protein, human β₂ adrenergicreceptor protein, rat renal outer medullary K⁺ channel protein and humansmall G-protein rho.
 16. The nucleic acid of claim 2, wherein the secondnucleotide sequence encodes a protein selected from the group consistingof rabbit prostaglandin E₂EP₃ receptor protein, human prostaglandinE₂EP₂ receptor protein, human chemokine receptor CCR-5 protein, human β₂adrenergic receptor protein, rat renal outer medullary K⁺ channelprotein and human small G-protein rho.
 17. The nucleic acid of claim 3,wherein the second nucleotide sequence encodes a protein selected fromthe group consisting of rabbit prostaglandin E₂EP₃ receptor protein,human prostaglandin E₂EP₂ receptor protein, human chemokine receptorCCR-5 protein, human β₂ adrenergic receptor protein, rat renal outermedullary K⁺ channel protein and human small G-protein rho.
 18. Thenucleic acid of claim 4, wherein the second nucleotide sequence encodesa protein selected from the group consisting of rabbit prostaglandinE₂EP₃ receptor protein, human prostaglandin E₂EP₂ receptor protein,human chemokine receptor CCR-5 protein, human β₂ adrenergic receptorprotein, rat renal outer medullary K⁺ channel protein and human smallG-protein rho.
 19. The nucleic acid of claim 5, wherein the secondnucleotide sequence encodes a protein selected from the group consistingof rabbit prostaglandin E₂EP₃ receptor protein, human prostaglandinE₂EP₂ receptor protein, human chemokine receptor CCR-5 protein, human β₂adrenergic receptor protein, rat renal outer medullary K⁺ channelprotein and human small G-protein rho.
 20. The nucleic acid of claim 6,wherein the second nucleotide sequence encodes a protein selected fromthe group consisting of rabbit prostaglandin E₂EP₃ receptor protein,human prostaglandin E₂EP₂ receptor protein, human chemokine receptorCCR-5 protein, human β₂ adrenergic receptor protein, rat renal outermedullary K⁺ channel protein and human small G-protein rho.
 21. Thenucleic acid of claim 7, wherein the second nucleotide sequence encodesa protein selected from the group consisting of rabbit prostaglandinE₂EP₃ receptor protein, human prostaglandin E₂EP₂ receptor protein,human chemokine receptor CCR-5 protein, human β₂ adrenergic receptorprotein, rat renal outer medullary K⁺ channel protein and human smallG-protein rho.
 22. The nucleic acid of claim 8, wherein the secondnucleotide sequence encodes a protein selected from the group consistingof rabbit prostaglandin E₂EP₃ receptor protein, human prostaglandinE₂EP₂ receptor protein, human chemokine receptor CCR-5 protein, human β₂adrenergic receptor protein, rat renal outer medullary K⁺ channelprotein and human small G-protein rho.
 23. An isolated nucleic acidhaving the nucleotide sequence selected from the group consisting of SEQID NO:1 (rabbit prostaglandin E₂EP₃ receptor protein), SEQ ID NO:2(human prostaglandin E₂EP₂ receptor protein), SEQ ID NO:3 (humanchemokine receptor CCR-5 protein), SEQ ID NO:4 (human β₂ adrenergicreceptor protein), SEQ ID NO:5 (rat renal outer medullary K⁺ channelprotein) and SEQ ID NO:6 (human small G-protein rho).
 24. A method ofproducing a eukaryotic protein in a bacterial cell comprising: a)introducing the nucleic acid of claim 1, wherein the second nucleotidesequence encodes a eukaryotic protein, into the bacterial cell; and b)culturing the bacterial cell under conditions whereby the secondnucleotide sequence of the nucleic acid is expressed to produce theeukaryotic protein.
 25. A method of producing a eukaryotic protein in abacterial cell in high yield comprising: a) introducing the nucleic acidof claim 1, wherein the second nucleotide sequence encodes a eukaryoticprotein, into the bacterial cell; and b) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of the nucleicacid is expressed to produce the eukaryotic protein in high yield.
 26. Amethod of producing a eukaryotic integral membrane protein in abacterial cell comprising: a) introducing the nucleic acid of claim 1,wherein the second nucleotide sequence encodes a eukaryotic integralmembrane protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the nucleic acid is expressed to produce the eukaryotic integralmembrane protein.
 27. A method of producing a eukaryotic G-proteincoupled receptor protein in a bacterial cell comprising: a) introducingthe nucleic acid of claim 1, wherein the second nucleotide sequenceencodes a eukaryotic G-protein coupled receptor protein, into thebacterial cell; and b) culturing the bacterial cell under conditionswhereby the second nucleotide sequence of the nucleic acid is expressedto produce the eukaryotic G-protein coupled receptor protein.
 28. Amethod of producing a eukaryotic ion channel protein in a bacterial cellcomprising: a) introducing the nucleic acid of claim 1, wherein thesecond nucleotide sequence encodes a eukaryotic ion channel protein,into the bacterial cell; and b) culturing the bacterial cell underconditions whereby the second nucleotide sequence of the nucleic acid isexpressed to produce the eukaryotic ion channel protein.
 29. A method ofproducing a rabbit prostaglandin E₂ EP₃ receptor protein in a bacterialcell comprising: a) introducing the nucleic acid of claim 1, wherein thesecond nucleotide sequence encodes the rabbit prostaglandin E₂ EP₃receptor protein, into the bacterial cell; and b) culturing the cellunder conditions whereby the second nucleotide sequence of theexpression vector is expressed to produce the rabbit prostaglandin E₂EP₃ receptor protein.
 30. A method of producing a human prostaglandin E₂EP₂ receptor protein in a bacterial cell comprising: a) introducing thenucleic acid of claim 1, wherein the second nucleotide sequence encodesthe human prostaglandin E₂ EP₂ receptor protein, into the bacterialcell; and b) culturing the bacterial cell under conditions whereby thesecond nucleotide sequence of the nucleic acid is expressed to producethe human prostaglandin E₂ EP₂ receptor protein.
 31. A method ofproducing a human chemokine receptor CCR-5 protein in a bacterial cellcomprising: a) introducing the nucleic acid of claim 1, wherein thesecond nucleotide sequence encodes the human chemokine receptor CCR-5protein, into the bacterial cell; and b) culturing the bacterial cellunder conditions whereby the second nucleotide sequence of the nucleicacid is expressed to produce the human chemokine receptor CCR-5 protein.32. A method of producing a human β₂ adrenergic receptor protein in abacterial cell comprising: a) introducing the nucleic acid of claim 1,wherein the second nucleotide sequence encodes the human β₂ adrenergicreceptor protein, into the bacterial cell; and b) culturing thebacterial cell under conditions whereby the second nucleotide sequenceof the nucleic acid is expressed to produce the human β₂ adrenergicreceptor protein.
 33. A method of producing a rat renal outer medullaryK⁺ channel protein in a bacterial cell comprising: a) introducing thenucleic acid of claim 1, wherein the second nucleotide sequence encodesthe rat renal outer medullary K⁺ channel protein, into the bacterialcell; and b) culturing the bacterial cell under conditions whereby thesecond nucleotide sequence of the nucleic acid is expressed to producethe rat renal outer medullary K⁺ channel protein.
 34. A method ofproducing a human small G-protein rho protein in a bacterial cellcomprising: a) introducing the nucleic acid of claim 1, wherein thesecond nucleotide sequence encodes the small G-protein rho protein, intothe bacterial cell; and b) culturing the bacterial cell under conditionswhereby the second nucleotide sequence of the nucleic acid is expressedto produce the small G-protein rho protein.